Database Soup: Wrong defaults for zone_reclaim

Saturday, August 25, 2012

Wrong defaults for zone_reclaim_mode on Linux

My coworker Jeff Frost just published a writeup on "zone reclaim mode" in Linux, and how it can be a problem. Since his post is rather detailed, I wanted to give a "do this" summary:

zone_reclaim_mode defaults to the wrong value for database servers, 1, on some Linux distributions, including Red Hat.
This default will both cause Linux to fail to use all available RAM for caching, and throttle writes.
If you're running PostgreSQL, make sure that zone_reclaim_mode is set to 0.

Frankly, given the documentation on how zone_reclaim_mode works, I'm baffled as to what kind of applications it would actually benefit. Could this be another Linux misstep, like the OOM killer?

4 comments:

AnonymouzMay 21, 2013 at 7:42 PM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousDecember 17, 2015 at 3:26 PM
This is an old post, but since there is a lot of confusion lingering about this setting, it's there for HPC workloads (which drove a lot of the NUMA development in the first place). HPC simulations are one example of a class of applications which (A) saturate the memory bus and (B) run in such perfect synchronization that they are highly sensitive to memory latency.

When you run this sort of code and one NUMA node fills up, if that core has to borrow memory bandwidth from its neighbor then both cores start running at 50% at best (since that's the memory bandwidth available to each) or perhaps even slower. When one pair of cores degrades, the entire simulation slows down catastrophically since all the simulation cells have to exchange results for every time step in the simulation.

Production Linux supercomputing dates back to the late 90s; even when NUMA became common in the mid 20-single-digits, this sort of large-RAM database design wasn't dominant yet. Now that it is, the default has swung the other way.
ReplyDelete
Replies

Add comment