Tuesday, April 24, 2012

Red Hat Kernel cache clearing issue

Recently, mega-real-estate sales site Tigerlead called us with a very strange problem.  One of their dedicated PostgreSQL servers refused to use most of its available RAM, forcing the system to read from disk.  Given that the database was 60GB in size and the server had 96GB of RAM, this was a painful performance degradation.

Output of free -m:

             total       used       free     shared    buffers     cached
Mem:         96741      50318      46422          0         21      44160
-/+ buffers/cache:       6136      90605
Swap:        90111          3      90107
 
As you can see here, the system is only using half the free memory for cache, and leaving the other half free.  This would be normal behavior if only half the cache were needed, but IOstat also showed  numerous and frequent reads from disk, resulting in IOwaits for user queries.  Still, there could be other explanations for that.

So, I tried forcing a cache fill by doing a pgdump.  This caused the cache to mostly fill free memory -- but then Linux aggressively cleared the cache, again getting it down to around 40GB of cache within a few minutes.  This seemed to be the case no matter what we did, including tinkering with the vm parameters, increasing the size of the swap file, and changing shared_buffers.  This was highly peculiar; it was as if Linux was convinced that we had half as much RAM as we did.

What fixed the problem was changing the kernel version.  It turns out that kernel
2.6.32-71.29.1.el6.x86_64, released by Red Hat during a routine update, has some kind of cache management issue which can't be fixed in user space.  Fortunately, they now have a later kernel version out as an update.

Before:

[root ~]# free -g
             total       used       free     shared    buffers     cached
Mem:            94         24         70          0          0         19
[root ~]# uname -a
Linux server1.company.com 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon
Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

After:

[root ~]# free -g
             total       used       free     shared    buffers     cached
Mem:            94         87          6          0          0         83
[root ~]# uname -a
Linux server1.company.com 2.6.32-220.4.2.el6.x86_64 #1 SMP Tue
Feb 14 04:00:16 GMT 2012 x86_64 x86_64 x86_64 GNU/Linux

That's more like it!   Thanks to Andrew Kerr of Tigerlead for helping figure this issue out.

I don't know if other Linux distributors released the same kernel with any routine update.  I haven't seen this behavior (yet) with Ubuntu, Debian, or SuSE.  If you see it, please report it in the comments, or better to the appropriate mailing list.

9 comments:

  1. Do you know which exact version fixed this?

    ReplyDelete
  2. 2.6.32-131.0.15.el6.x86_64 doesn't seem to have this problem so it must have been fixed before that.

    ReplyDelete
    Replies
    1. Thanks! That narrows it down even further.

      Now the other question is when the problem was introduced ...

      Delete
  3. The buffers and caches can be cleared using these commands (at least in Linux 2.6.16 and later kernels):

    To free pagecache:

    echo 1 > /proc/sys/vm/drop_caches

    To free dentries and inodes:

    echo 2 > /proc/sys/vm/drop_caches

    To free pagecache, dentries and inodes:

    echo 3 > /proc/sys/vm/drop_caches

    Matias Colli
    RHCSA

    ReplyDelete
  4. Of course, i always use echo 3 > /proc/sys/vm/drop_caches

    Matias Colli
    RHCSA

    ReplyDelete
  5. Josh, do you know the model of CPUs with which this issue was seen? amd/intel? I have got a similar case on intel, amd does work well. But it could be just a coincidence.

    ReplyDelete
  6. another one http://comments.gmane.org/gmane.comp.db.sqlite.general/79457

    ReplyDelete
  7. I just found relation between free unused memory and a count of active process. ~1GB per backend.

    in 376GB total memmory and 32 core

    if ( user cpu + io wait ) is 145% then i have ~140GB free

    ReplyDelete