Facebook developed a new OOM killer for Linux

If it doesn't fit anywhere else, drop it in here. (not to be used as a chat/nonsense section)

Moderator: Moderator Team

Post Reply
erkinalp
Posts: 861
Joined: Sat Dec 20, 2008 5:55 pm
Location: Izmir, TR

Facebook developed a new OOM killer for Linux

Post by erkinalp »

Windows does not have memory overcommit, but this piece of software might be useful to study to people who wish to create a fork with memory overcommit:
https://code.fb.com/production-engineer ... ling-ooms/
-uses Ubuntu+GNOME 3 GNU/Linux
-likes Free (as in freedom) and Open Source Detergents
-favors open source of Windows 10 under GPL2
anthracen
Posts: 43
Joined: Thu May 10, 2018 2:28 pm

Re: Facebook developed a new OOM killer for Linux

Post by anthracen »

erkinalp wrote: Mon Jul 23, 2018 10:53 pm Windows does not have memory overcommit, but this piece of software might be useful to study to people who wish to create a fork with memory overcommit:
https://code.fb.com/production-engineer ... ling-ooms/
actually that post is not about "overcommit" at all. they oppositely say it's not good:
Memory overcommit, where more memory is allocated for processes than the total available system memory, is a common technique for increasing memory utilization. Memory overcommit is based on the assumption that not all assigned memory is needed by running applications. This assumption is not always true: When demand exceeds total available memory, the Linux OOM killer tries to reclaim memory. The Linux OOM killer’s primary responsibility is to protect the kernel so that the machine stays up; it accomplishes this by killing some processes without heed to the importance of a given workload. Hence, whenever the OOM killer engages, there is a significant risk that applications running on the machine will be affected.
actually a very watered with unclear statements about nothing blog post revealing nothing. except we know they invented yet another square blinking wheel for linux, wahoo. >_>

Windows from its beginning can commit more memory than the available RAM. thanks to paging files. no need to kill processes.
But from that fuzzy blog it barely understandable what they actually mean, like this:
Memory overcommit, where more memory is allocated for processes than the total available system memory, is a common technique for increasing memory utilization.
Really? aha, I see, it's their goal - to bloat for the sake of bloating. Now I know why their site is such a f&&&&ng hog.
If they mean 2 stage virtual memory system - DRAM<->secondary storage, paging, - where more memory than avail. RAM is commited, then even linux has paging. they only call it swap despite it's paging. but if linux kills processes to reclaim memory... okey then. no wonder. :lol:

It's all foreign for Windows. their usage of terminology is foreign. commited != allocated for example.

I personally keep myself as far as possible from linux internals to not screw up my studying of NT and knowldge about OS design. I'd suggest others who is interested in NT-like OS development do the same. With messing around with Windows code you could endanger your sacred GPL. But with messing around with linux concepts, you can screw up your vision/understanding of the OS architecture entirely. Keep your flower intact. :lol:
erkinalp
Posts: 861
Joined: Sat Dec 20, 2008 5:55 pm
Location: Izmir, TR

Re: Facebook developed a new OOM killer for Linux

Post by erkinalp »

You can allocate more than RAM on Linux but it will not kill unless you try to access too many of them. That is the point of OOM killer. Linux always tries to keep enough free space on physical memory, but once swapping becomes too more, killer kicks in. Unix was also non-overcommit, i.e. memory allocator straight out fails if you exceed RAM.
I personally keep myself as far as possible from linux internals to not screw up my studying of NT and knowldge about OS design.
Open source operating systems can share development effort. You can read the simpler OS, then contribute to more complex one (ReactOS is the more complex one). Even with compatibility and architectural limitations, there are bits that can be shared among multiple open source operating systems.
-uses Ubuntu+GNOME 3 GNU/Linux
-likes Free (as in freedom) and Open Source Detergents
-favors open source of Windows 10 under GPL2
anthracen
Posts: 43
Joined: Thu May 10, 2018 2:28 pm

Re: Facebook developed a new OOM killer for Linux

Post by anthracen »

As per my ignorant vision, paging is the way of resolving this problem. if you are such a monstrous corporation running thousands servers with tons of gigabytes of industrial grade SSDs/HDDs' space, it should be way enough to provide space for page files. If you still fill them all up and fly away out of memory, then well, either your OS of choice sucks or your own software stack deployment is bad, really bad. I don't see killing processes as a means of solving the out of memory problem at all. it's insane. every process' working set residing now in memory definitely is needed for something, that participates in spinning of the machine, so killing any of these processes will impact negatively the whole session. be it a web server, database server or whatever. better to deny to allocate new memory and signal out of memory fatal error. I think. then it gets attention of administrators and they take some care, a much more sophisticated than the stupid killing processses. even reboot sounds nicer. at least it's a deterministic way of overcoming the problem. unlike killing random caught processes.
basically facebook invented yet another parasitic thing that will be sitting in memory and eating up yet some resources to do "kind of very wise algorithms for predicting oom situations" that still will end up with the same "kill it" result, should the oom happen. if their servers are running oom, adding a useless bloat service for sure won't help. no, i don't think that this kind of helper services are trash, I just suspect they search in a wrong direction this time. their problem is bloat of the current software. this is what should have been fought in the first place.
erkinalp
Posts: 861
Joined: Sat Dec 20, 2008 5:55 pm
Location: Izmir, TR

Re: Facebook developed a new OOM killer for Linux

Post by erkinalp »

it should be way enough to provide space for page files.
The problem is that paging is dangerous at FB scale. Data becames too old and unusable when the time comes to read them back.
even reboot sounds nicer.
Reboot of a FB server to resolve OOM means you would be randomly logged out of your FB account when FB server overloads.
better to deny to allocate new memory and signal out of memory fatal error.
Allocator can return failure, but some applications were designed like it would never. OOM killer is a "backward compatibility" measure against these. Disable OOM killer on Linux system with heavy memory usage and you would see how badly applications behave if a memory allocation gets refused.
-uses Ubuntu+GNOME 3 GNU/Linux
-likes Free (as in freedom) and Open Source Detergents
-favors open source of Windows 10 under GPL2
anthracen
Posts: 43
Joined: Thu May 10, 2018 2:28 pm

Re: Facebook developed a new OOM killer for Linux

Post by anthracen »

The problem is that paging is dangerous at FB scale. Data becames too old and unusable when the time comes to read them back.
how is that? data is sycnhronized between these storages (RAM, secondary storage). Once the kernel notices there is too few memory available, it calls Working Set Trimmer, that just pulls off pages out of the process' WS. If those are clean (synchronized with backing storage) they go into the Standby list. If the process which page was taken of, accesses it inbetween, the page goes back from Standby list. It's a soft page fault. If the page was modified in memory and is not synchronized with its backing storage, it goes into the Modified page list. Before giving this page for other processes, its contents is written back on the backing storage - paging file or the file of origin (mapped file). Never this system would cause "stale" content on the backing storage.

The oom could only arise if the total Working Sets sum gets up to the available memory. It's not the problem for OS per se, since it's a clear out of resources situation caused by overloading the machine. You just cannot expect of it to do more than it can. But again, instead of killing processes, in this situation, no matter how sophisticated your analysis on what to kill was, better do trim working sets. The session will remain in a much more consistent state. Properly implemented, this system is limited only by the amount of the total backing storage you provided. Since the count is on hundreds of gigabytes, even FB could fit into it. Or, again, they chose wrong OS for their hosts or their system is a hopeless bloat. Or both. :lol: just imo.

With demand paging (which is the best approach for the real memory efficiency, and the latter is a very scarce and expenisve resourse), the only useful "overcommit" is when you, handling a page fault, let the process more than 1 page at once. For example, for code you give 1 page where the fault happened, and 2 before and 2 after. So that, it incurs less page faults in its run time. Again, this trick doesn't change anything with respect to handling memory exhausting. Trimming working stes is the answer.
erkinalp
Posts: 861
Joined: Sat Dec 20, 2008 5:55 pm
Location: Izmir, TR

Re: Facebook developed a new OOM killer for Linux

Post by erkinalp »

Or, again, they chose wrong OS for their hosts or their system is a hopeless bloat.
For same reason most people use Windows, most countries have rails 1435mm apart and US uses 110V, 60Hz mains currency: backward compatibility. Mark begun in a simple LAMP server and nobody afforded time to replace it.
-uses Ubuntu+GNOME 3 GNU/Linux
-likes Free (as in freedom) and Open Source Detergents
-favors open source of Windows 10 under GPL2
Post Reply

Who is online

Users browsing this forum: No registered users and 15 guests