Monday, September 29, 2014

Why you need to avoid Linux Kernel 3.2

In fact, you really need to avoid every kernel between 3.0 and 3.8.  While RHEL has been sticking to the 2.6 kernels (which have their own issues, but not as bad as this), Ubuntu has released various 3.X kernels for 12.04.  Why is this an issue?  Well, let me give you two pictures.

Here's private benchmark workload running against PostgreSQL 9.3 on Ubuntu 12.04 with kernel 3.2.0.  This is the IO utilization graph for the replica database, running a read-only workload:



Sorry for cropping; had to eliminate some proprietary information.  The X-axis is time. This graph shows MB/s data transfers -- in this case, reads -- for the main database disk array.  As you can see, it goes from 150MB/s to over 300MB/s.  If this wasn't an SSD array, this machine would have fallen over. 

Then we upgraded it to kernel 3.13.0, and ran the same exact workload as the previous test.  Here's the new graph:



Bit of a difference, eh?  Now we're between 40 and 60MB/s for the exact same workload: an 80% reduction in IO.   We can thank the smart folks in the Linux FS/MM group for hammering down a whole slew of performance issues.

So, check your Postgres servers and make sure you're not running a bad kernel!

20 comments:

  1. >> Linux ubuntu 3.5.0-54-generic #81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

    >>Distributor ID: Ubuntu
    >>Description: Ubuntu 12.04.4 LTS
    >>Release: 12.04
    >>Codename: precise


    Is this version good for postgres?

    ReplyDelete
    Replies
    1. Nope. 3.5 falls in the range of versions to skip.

      Delete
    2. For that matter, 3.5 has some segfault crash bugs in the virtualization system.

      Delete
    3. Is this version 3.5 good or bad for postgresql

      Delete
    4. 3.5 has known major performance problems, per multiple citations above.

      Delete
  2. Do you really mean in between 3.0 and 3.8 because 3.13 then falls in between?
    I run latest CentOS Linux release 7.0.1406 -- 3.10.0-123.6.3.el7.x86_64, is that ok?

    ReplyDelete
    Replies
    1. Cypisek: that's not how release numbering works. 3.13 is *after* 3.8.

      Delete
    2. Kernel 3.10, however, should be fine.

      Delete
  3. It is implied but not clearly spelled out in the post; you mean to say that the database yields same, or maybe better, TPS even with reduced load on the I/O subsystem, right?

    ReplyDelete
    Replies
    1. Well, the kernel 3.2 run actually had worse TPS because of the IOwaits. So 3.13 gave better TPS with reduced IO.

      Delete
  4. Ubuntu 14.04 LTS has been out quite a while now, and even had its first point release. I certainly recommend it over 12.04 LTS, and ships with the 3.13 kernel.

    ReplyDelete
  5. May I ask how did you get those figures and create graph?

    ReplyDelete
    Replies
    1. It's the internal performance monitoring from one of my clients. As such, I can't share details.

      Delete
  6. Does this problem mainly affect Postgres? Or would you recommend against any system (even those not running PG) avoid these kernels?

    ReplyDelete
    Replies
    1. I'd recommend against using them for any service which needs to do concurrent IO. Besides, you can't safely run Docker without upgrading to 3.9 or later anyway.

      Delete
  7. I didn't see this, but for posterity, the kernel devs suggested this is due to the 3.2-3.8 memory managers being overly aggressive about stale cache purging. Basically they weren't properly promoting inactive cache into the active set, so data was being repeatedly invalidated while it was being loaded from disk, leading to a ceaseless IO cycle.

    There were several patches that corrected this behavior, but some of the more subtle ones didn't make it in until 3.12 or so. 3.8 is the bare minimum for running a stable Linux server, IMO.

    ReplyDelete
  8. 3.2.0 kernel shows 150-350MB/s read - faster IO
    |
    3.13.0 kernel shows 40-60MB/s read - much slower IO

    Your comment in your blog says "Kernel 3.10, however, should be fine." but this falls between 3.2 and 3.13

    If we should avoid everything between 3.0 and 3.8, then why does your graph suggest the slowdown in IO comes at 3.13 which you point out "3.13 is *after* 3.8."

    Slightly confused here

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. You really need to read the text as well as looking at the graphs. You missed the two places where I explain that this is a fixed size workload; that is, it's the exact same number of queries and data output in both runs.

      The kernel 3.2 run is doing the exact same amount of work as the kernel 3.13 run. This shows how 3.2 has memory management issues; it's doing 140MB/s in completely unnecessary IO (on top of the 50MB/s of necessary IO).

      Those issues were fixed in kernels 3.9 and 3.10, depending on your distribution.

      Delete