- zone_reclaim_mode defaults to the wrong value for database servers, 1, on some Linux distributions, including Red Hat.
- This default will both cause Linux to fail to use all available RAM for caching, and throttle writes.
- If you're running PostgreSQL, make sure that zone_reclaim_mode is set to 0.
Saturday, August 25, 2012
Wrong defaults for zone_reclaim_mode on Linux
My coworker Jeff Frost just published a writeup on "zone reclaim mode" in Linux, and how it can be a problem. Since his post is rather detailed, I wanted to give a "do this" summary:
Posted by Josh Berkus at 7:43 PM
Labels: linux, memory, performance, postgresql
Subscribe to: Post Comments (Atom)
This comment has been removed by the author.ReplyDelete
Defaults to zero on my system.Delete
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.3 (Santiago)
$ uname -a
Linux work-desktop 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/sysctl.conf | grep zone
$ cat /proc/sys/vm/zone_reclaim_mode
After a little more looking, Both of us are correct. It can be different depending on the machine. HOpefully you can check view:Delete
The commit 9eeff2395e3cfd05c9b2e6074ff943a34b0c5c21 introduced this.
For more details, please check the upstream kernel discussion here : http://marc.info/?l=linux-kernel&m=113408418232531&w=2
In RHEL-6.1 'zone_reclaim_mode' is set to 1 and in RHEL-6.2 it set back to 0.
This is an old post, but since there is a lot of confusion lingering about this setting, it's there for HPC workloads (which drove a lot of the NUMA development in the first place). HPC simulations are one example of a class of applications which (A) saturate the memory bus and (B) run in such perfect synchronization that they are highly sensitive to memory latency.ReplyDelete
When you run this sort of code and one NUMA node fills up, if that core has to borrow memory bandwidth from its neighbor then both cores start running at 50% at best (since that's the memory bandwidth available to each) or perhaps even slower. When one pair of cores degrades, the entire simulation slows down catastrophically since all the simulation cells have to exchange results for every time step in the simulation.
Production Linux supercomputing dates back to the late 90s; even when NUMA became common in the mid 20-single-digits, this sort of large-RAM database design wasn't dominant yet. Now that it is, the default has swung the other way.