Friday, October 10, 2014

New Table Bloat Query

To accompany the New Index Bloat Query, I've written a New Table Bloat Query.  This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the version in several ways:
  • Rewritten to use WITH statements for better maintainability and clarity
  • Conditional logic for old Postgres versions and 32-bit platforms taken out
  • Index bloat removed, since we have a separate query for that
  • Columns formatted to be more immediately comprehensible
In the course of building this, I found two fundamentally hard issues:
  1. Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
  2. There's no good way to estimate bloat for compressed (TOAST) attributes and rows.
Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size.  Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial.  I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188   

pct_bloat is how much of the table (0 to 100) is estimated to be dead space.  MB_bloat is how many megabytes of bloat are estimated to exist.  Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB.  However, you should calibrate this according to your own database.

Thursday, October 2, 2014

JSONB and 9.4: Move Slow and Break Things

If you've been paying any attention at all, you're probably wondering why 9.4 isn't out yet.  The answer is that we had to change JSONB at the last minute, in a way that breaks compatibility with earlier betas.

In August, a beta-testing user reported that we had an issue with JSONB not compressing well.  This was because of the binary structure of key offsets at the beginning of the JSONB value, and the affects were dramatic; in worst cases, JSONB values were 150% larger than comparable JSON values.   We spent August through September revising the data structure and Heikki and Tom eventually developed one which gives better compressibility without sacrificing extraction speed.

I did a few benchmarks on the various JSONB types.  We're getting a JSONB which is both faster and smaller than competing databases, so it'll be worth the wait.

However, this means that we'll be releasing an 9.4beta3 next week, whose JSONB type will be incompatible with prior betas; you'll have to dump and reload if you were using Beta 1 or Beta 2 and have JSONB data.  It also means a delay in final release of 9.4.

Monday, September 29, 2014

Why you need to avoid Linux Kernel 3.2

In fact, you really need to avoid every kernel between 3.0 and 3.8.  While RHEL has been sticking to the 2.6 kernels (which have their own issues, but not as bad as this), Ubuntu has released various 3.X kernels for 12.04.  Why is this an issue?  Well, let me give you two pictures.

Here's private benchmark workload running against PostgreSQL 9.3 on Ubuntu 12.04 with kernel 3.2.0.  This is the IO utilization graph for the replica database, running a read-only workload:

Sorry for cropping; had to eliminate some proprietary information.  The X-axis is time. This graph shows MB/s data transfers -- in this case, reads -- for the main database disk array.  As you can see, it goes from 150MB/s to over 300MB/s.  If this wasn't an SSD array, this machine would have fallen over. 

Then we upgraded it to kernel 3.13.0, and ran the same exact workload as the previous test.  Here's the new graph:

Bit of a difference, eh?  Now we're between 40 and 60MB/s for the exact same workload: an 80% reduction in IO.   We can thank the smart folks in the Linux FS/MM group for hammering down a whole slew of performance issues.

So, check your Postgres servers and make sure you're not running a bad kernel!