Linux, Ceph, Openstack and Privacy Tech

Rebooting on Out of Memory Error

19th July 2017

Out of memory errors can be a real nightmare. While the OOM killer will do a decent job at keeping the system stable, sometimes it can also get overwhelmed and killing the processes may only put the system into perpetual instability.

Instead, if the situation gets so bad that the OOM killer is invoked, I prefer to restart the system. I should highlight this is not a fix for Out of Memory errors but is only a way to ensure if they do happen, my server gets back online as quickly as possible. This means if the error is encountered while I am fast asleep, I have greater confidence in the system recovering with a full restart than leaving myself to the mercy of OOM killer or the kernel stuck in a panic. Fortunately, this trick can be applied to any server running the Linux kernel.

To start, we need to ensure that the server firstly induces a kernel panic when the out of memory error hits. Then, the kernel should be instructed to restart soon after this panic. We can implement this in the kernel immediately using these commands:

sysctl vm.panic_on_oom=1
sysctl kernel.panic=5

However, these settings will not persist, so we need to ensure if a reboot occurs these do not revert to their defaults. We can do that by editing the /etc/sysctl.conf file, which stores kernel parameters.

# Induce Kernel Panic on OOM
vm.panic_on_oom=1
# Reboot on Kernel Panic
kernel.panic=5

Now save the file, and that's it! If you hit another Out of Memory error, the kernel will induce a panic and restart after 5 seconds. Of course, if you see this happening quite often, it needs investigating, but this trick should hopefully keep your systems online even when you are unable to reach them.

AUTHOR

Thomas White

View Comments