Alert Triggering for Out-of-Memory (OOM) Killer in Container

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

Rational memory usage is one of the most influential principles which defines both cost-efficiency and reliability of your application hosting. However, configuring proper RAM utilization may turn out to be far more tricky than it seems at first. In particular, there is a number of hidden stumbling blocks that should be considered to avoid exceeding of memory limits.

Therefore, if letting the things go slide, incorrect or even just unoptimized memory settings could cause your container becoming prone to occasional RAM overuse. Herewith, when server memory limits are reached, the system typically tries to keep it functioning through calling an embedded Out-of-Memory (OOM) Killer. To release the required capacities, this commonly used tool analyzes all container processes and ranges them by priority/resource consumption. Then, it suspends the currently non-important services in favor of more essential ones. However, as a part of processes are killed, the application can’t be considered fully functional and usually requires some manual involvement for being restored.

out-of-memory killer tracking

Such OOM problems are common for almost all software stacks and programming languages – e.g. recently we’ve examined some specifics of Java memory allocation within Docker containers, where incorrect algorithm of the available RAM scope determination often causes the main Java process being killed by the kernel.

If becoming typical for your server, frequently occurring OOM issues should cause a serious concern to review your application code and settings. The persistent state of running out of memory could imply not just an ordinary capacities shortage, but the existence of some underlying problems or misconfigurations that obstruct an efficient memory utilization.

Thus, striving to facilitate dealing with such issues, Jelastic PaaS provides out-of-box monitoring of OOM Killer activity within your containers by sending appropriate email notifications upon its executions. This enables you to be aware of your application resource demands and to get rid of possible memory leaks through revealing the root OOM cause. So, below we’ll consider how to retrieve the required information on container RAM overuse to define its proper resolution, and will simulate an occasional out-of-memory problem to see the whole mechanism in action.

OOM Killer Alerts

Alerting on OOM Killer executions is automatically enabled for all newly created instances at Jelastic Platforms of 5.0.5 version and higher. For legacy containers (or in case the appropriate functionality is not provisioned by default due to specific hosting provider settings), the required alerts can be added manually, as well as the default ones – being disabled or reconfigured.

To access the corresponding management options and review the list of the already enabled notification triggers, switch to the Monitoring > Load Alerts section of your environment Settings:out-of-memory killer monitoring

Here, OOM Killer-dedicate triggers (if any) could be recognized by the OOM kill condition.

Use Add / Edit at the top pane to create new or modify any existing alert respectively.out-of-memory killer alert

Here, each OOM kill notification trigger is defined with the following options:

  • Name – alert denomination, which will be used within Platform UI and subsequent email notifications
  • Nodes – environment layer to be monitored for OOM events (you can apply trigger to any layer within the chosen environment)
  • Whenever – select the Out of memory (OOM) kill occurs condition for trigger execution
  • Notification frequency – set delay (in hours) for the repetitive message to be sent or choose Immediately to get a notification each time any process is killed

Once confirmed with Add (Apply), you’ll start receiving emails on all out-of-memory issues within containers of the chosen type. Such notification contains information on the amount of RAM consumed to invoke the killer, table with details on killed process(es) and some general recommendations on fixing and preventing such issues:OOM killer at ubuntu alert monitoringIn most cases, receiving such alert implies that you need to investigate the root cause of the problem and apply some resolution. The most general recommendations are:

  • Review your application code for possible memory leak causes
  • Increase amount of allocated resources and/or instances for a corresponding server with Jelastic automatic vertical / horizontal scaling (as your application could just require more capacities for proper operability)

Aiming to help you with resolving OOM events quicker, we’ve already analyzed the most commonly met issues and gathered the very efficient ways to fix them within the appropriate OOM Killer Issues Troubleshooting guide. There, all possible kills are divided into three main groups:

  • Common Cases – processes that are run by default on any Jelastic container and can be killed by OOM tool
  • Processes of High Risk – presumable memory leaks, which require special actions or application code optimization; are sorted based on used stack type/program language, where each section provides the appropriate engine-related general recommendations, as well as kill resolutions for specific processes
  • Non-Leaking Processes – operations that could be terminated by OOM Killer though not representing the root of the problem; the general fix for all such issues is to restart a container in order to restore the corresponding processes

So, to find out the required solution, fetch the killed process name you’ve received via email notification and look for it within the linked above documents.

And below, we’ll consider the above-described mechanisms of OOM Killer execution and alerting in practice, through simulating abnormal RAM consumption by a heavy-loaded process inside a container.

Simulation to Trigger OOM Killer Alert

So, let’s imitate the out-of-memory issue on a container to invoke the OOM Killer job and check its behavior. For that, we’ll generate some load on a server with a simple stress loading tool.

Note: Running stress within a container require full root permissions, thus you’ll need either a VPS node or Docker container to follow the example below.

1. Connect to the appropriate container via SSH Gate by performing the next operations:

2. Once inside, execute one of the following commands according to OS distribution your server runs to install our loader:

  • apt-get install stress – for Ubuntu / Debian
  • yum install stress – for CentOS / Fedora

oom killer

3. Start stress with the appropriate parameters to simulate memory shortage:

stress -m {workers} -t {timeout}cloud oom killer

where

  • {workers} – number of workers to consume RAM (256MB per worker by default)
  • {timeout} – number of seconds the loading will last for (e.g. 30s)

As you can see, the called action has failed due to the lack of RAM on the container.

4. Now, return to the dashboard and check the Monitoring > Events History tab.oom killer tracking

To find the required alerts quicker, use options within the Filters section:

  • Type – choose Load Alerts (the second option is devoted to Horizontal Scaling events)
  • Nodes – pick environment layer you’d like to view the information for via the automatically fetched list of server types (unless preferring to output All)
  • Trigger – select the Out of memory (OOM) kill occurs trigger type
  • From / To – specify date range to show alerts on OOM Killer executions within this period

For additional information on a particular OOM event (like killed Process ID, Node ID it was run at, etc), hover over the appropriate record or review the related email notification within your inbox.

5. Also, cluster administrators (i.e. who manages separate Jelastic installations) are provisioned with a throughout statistics on OOM Killer activity within the whole Platform – the corresponding data is stored in JCA > Reports > OOM Killer section.oom killer alert

To fetch the required information and narrow the list (e.g. by User ID as shown above), use filters at the top panel.

This way, you can monitor OOM Killer activities within your containers to efficiently utilize memory with no compromise on application performance. Test it right now absolutely for free during the two weeks trial period at any of Jelastic Cloud Platforms worldwide.

In case you face any issues with OOM Killer functionality at our Platform, let us know and get an assistance from our technical experts at Stackoverflow.
Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInEmail this to someone

Leave a Reply

Subscribe to get the latest updates