omg! OOM

Normally, a user-space program reserves (virtual) memory by calling malloc(). If the return value is NULL, the program knows that no more memory is available, and can do something appropriate. Most programs will print an error message and exit, some first need to clean up lockfiles or so, and some smarter programs can do garbage collection, or adapt the computation to the amount of available memory. This is life under Unix, and all is well.

Linux on the other hand is seriously broken. It will by default answer “yes” to most requests for memory, in the hope that programs ask for more than they actually need. If the hope is fulfilled Linux can run more programs in the same memory, or can run a program that requires more virtual memory than is available. And if not then very bad things happen.

What happens is that the OOM killer (OOM = out-of-memory) is invoked, and it will select some process and kill it. One holds long discussions about the choice of the victim. Maybe not a root process, maybe not a process doing raw I/O, maybe not a process that has already spent weeks doing some computation. And thus it can happen that one’s emacs is killed when someone else starts more stuff than the kernel can handle. Ach. Very, very primitive.

Of course, the very existence of an OOM killer is a bug.

A typical case: I do umount -a in a situation where 30000 filesystems are mounted. Now umount runs out of memory and the kernel log reports

Sep 19 00:33:10 mette kernel: Out of Memory: Killed process 8631 (xterm).
Sep 19 00:33:34 mette kernel: Out of Memory: Killed process 9154 (xterm).
Sep 19 00:34:05 mette kernel: Out of Memory: Killed process 6840 (xterm).
Sep 19 00:34:42 mette kernel: Out of Memory: Killed process 9066 (xterm).
Sep 19 00:35:15 mette kernel: Out of Memory: Killed process 9269 (xterm).
Sep 19 00:35:43 mette kernel: Out of Memory: Killed process 9351 (xterm).
Sep 19 00:36:05 mette kernel: Out of Memory: Killed process 6752 (xterm).

Randomly xterm windows are killed, until the xterm window that was X‘s console is killed. Then X exits and all user processes die, including the umount process that caused all this.

 

OK. This is very bad. People lose long-running processes, lose weeks of computation, just because the kernel is an optimist.

Demo program 1: allocate memory without using it.

#include <stdio.h>
#include <stdlib.h>

int main (void) {
        int n = 0;

        while (1) {
                if (malloc(1<<20) == NULL) {
                        printf("malloc failure after %d MiB\n", n);
                        return 0;
                }
                printf ("got %d MiB\n", ++n);
        }
}

 

Demo program 2: allocate memory and actually touch it all.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main (void) {
        int n = 0;
        char *p;

        while (1) {
                if ((p = malloc(1<<20)) == NULL) {
                        printf("malloc failure after %d MiB\n", n);
                        return 0;
                }
                memset (p, 0, (1<<20));
                printf ("got %d MiB\n", ++n);
        }
}

 

Demo program 3: first allocate, and use later.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define N       10000

int main (void) {
        int i, n = 0;
        char *pp[N];

        for (n = 0; n < N; n++) {
                pp[n] = malloc(1<<20);
                if (pp[n] == NULL)
                        break;
        }
        printf("malloc failure after %d MiB\n", n);

        for (i = 0; i < n; i++) {
                memset (pp[i], 0, (1<<20));
                printf("%d\n", i+1);
        }

        return 0;
}

Typically, the first demo program will get a very large amount of memory before malloc() returns NULL. The second demo program will get a much smaller amount of memory, now that earlier obtained memory is actually used. The third program will get the same large amount as the first program, and then is killed when it wants to use its memory. (On a well-functioning system, like Solaris, the three demo programs obtain the same amount of memory and do not crash but see malloc() return NULL.)

 

For example:

  • On an 8 MiB machine without swap running 1.2.11:
    demo1: 274 MiB, demo2: 4 MiB, demo3: 270 / oom after 1 MiB: Killed.
  • Idem, with 32 MiB swap:
    demo1: 1528 MiB, demo2: 36 MiB, demo3: 1528 / oom after 23 MiB: Killed.
  • On a 32 MiB machine without swap running 2.0.34:
    demo1: 1919 MiB, demo2: 11 MiB, demo3: 1919 / oom after 4 MiB: Bus error.
  • Idem, with 62 MiB swap:
    demo1: 1919 MiB, demo2: 81 MiB, demo3: 1919 / oom after 74 MiB: The machine hangs. After several seconds: Out of memory for bash. Out of memory for crond. Bus error.
  • On a 256 MiB machine without swap running 2.6.8.1:
    demo1: 2933 MiB, demo2: after 98 MiB: Killed. Also: Out of Memory: Killed process 17384 (java_vm). demo3: 2933 / oom after 135 MiB: Killed.
  • Idem, with 539 MiB swap:
    demo1: 2933 MiB, demo2: after 635 MiB: Killed. demo3: oom after 624 MiB: Killed.

Leave a Reply

Your email address will not be published. Required fields are marked *