Database Soup: Red Hat Kernel cache clearing issue

Tuesday, April 24, 2012

Red Hat Kernel cache clearing issue

Recently, mega-real-estate sales site Tigerlead called us with a very strange problem. One of their dedicated PostgreSQL servers refused to use most of its available RAM, forcing the system to read from disk. Given that the database was 60GB in size and the server had 96GB of RAM, this was a painful performance degradation.

Output of free -m:

             total       used       free     shared    buffers     cached
Mem:         96741      50318      46422          0         21      44160
-/+ buffers/cache:       6136      90605
Swap:        90111          3      90107

As you can see here, the system is only using half the free memory for cache, and leaving the other half free. This would be normal behavior if only half the cache were needed, but IOstat also showed numerous and frequent reads from disk, resulting in IOwaits for user queries. Still, there could be other explanations for that.

So, I tried forcing a cache fill by doing a pgdump. This caused the cache to mostly fill free memory -- but then Linux aggressively cleared the cache, again getting it down to around 40GB of cache within a few minutes. This seemed to be the case no matter what we did, including tinkering with the vm parameters, increasing the size of the swap file, and changing shared_buffers. This was highly peculiar; it was as if Linux was convinced that we had half as much RAM as we did.

What fixed the problem was changing the kernel version. It turns out that kernel

2.6.32-71.29.1.el6.x86_64, released by Red Hat during a routine update, has some kind of cache management issue which can't be fixed in user space. Fortunately, they now have a later kernel version out as an update.

Before:

[root ~]# free -g

             total       used       free     shared    buffers     cached
Mem:            94         24         70          0          0         19
[root ~]# uname -a
Linux server1.company.com 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon
Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

After:

[root ~]# free -g
             total       used       free     shared    buffers     cached
Mem:            94         87          6          0          0         83
[root ~]# uname -a
Linux server1.company.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue
Feb 14 04:00:16 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux

That's more like it! Thanks to Andrew Kerr of Tigerlead for helping figure this issue out.

I don't know if other Linux distributors released the same kernel with any routine update. I haven't seen this behavior (yet) with Ubuntu, Debian, or SuSE. If you see it, please report it in the comments, or better to the appropriate mailing list.

9 comments:

TuxieApril 25, 2012 at 1:37 AM
Do you know which exact version fixed this?
ReplyDelete
Replies
TuxieApril 26, 2012 at 2:50 AM
2.6.32-131.0.15.el6.x86_64 doesn't seem to have this problem so it must have been fixed before that.
ReplyDelete
Replies
Lic. Matias ColliSeptember 11, 2012 at 6:55 AM
The buffers and caches can be cleared using these commands (at least in Linux 2.6.16 and later kernels):

To free pagecache:

echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

echo 3 > /proc/sys/vm/drop_caches

Matias Colli
RHCSA
ReplyDelete
Replies
Lic. Matias ColliDecember 30, 2012 at 1:25 PM
Of course, i always use echo 3 > /proc/sys/vm/drop_caches

Matias Colli
RHCSA
ReplyDelete
Replies
AnonymousApril 16, 2013 at 12:29 AM
Josh, do you know the model of CPUs with which this issue was seen? amd/intel? I have got a similar case on intel, amd does work well. But it could be just a coincidence.
ReplyDelete
Replies
AnonymousApril 17, 2013 at 3:57 AM
was there any NFS ?
ReplyDelete
Replies
AnonymousApril 18, 2013 at 5:27 AM
another one http://comments.gmane.org/gmane.comp.db.sqlite.general/79457
ReplyDelete
Replies
AnonymousApril 24, 2013 at 11:08 AM
I just found relation between free unused memory and a count of active process. ~1GB per backend.

in 376GB total memmory and 32 core

if ( user cpu + io wait ) is 145% then i have ~140GB free
ReplyDelete
Replies

Add comment