Friday, October 10, 2014

New Table Bloat Query

To accompany the New Index Bloat Query, I've written a New Table Bloat Query.  This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the check_postgres.pl version in several ways:
  • Rewritten to use WITH statements for better maintainability and clarity
  • Conditional logic for old Postgres versions and 32-bit platforms taken out
  • Index bloat removed, since we have a separate query for that
  • Columns formatted to be more immediately comprehensible
In the course of building this, I found two fundamentally hard issues:
  1. Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
  2. There's no good way to estimate bloat for compressed (TOAST) attributes and rows.
Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size.  Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial.  I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
 --------------+------------+-------------------+-----------+----------+----------  
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188   

pct_bloat is how much of the table (0 to 100) is estimated to be dead space.  MB_bloat is how many megabytes of bloat are estimated to exist.  Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB.  However, you should calibrate this according to your own database.

9 comments:

  1. Hello Josh,

    You might have miss my last blog post about table bloat:

    http://blog.ioguix.net/postgresql/2014/09/10/Bloat-estimation-for-tables.html

    «I'm not clear on where that error is coming from, so help improving the math is very welcome!»
    I wrote about bloat coming from type alignment there. Maybe it takes part on the +/- 20% error you have with your query ? Note that my query is doing some rough estimation on toasted data as well.

    See you at pgconf.eu un spain next week ?

    Cheers,

    ReplyDelete
    Replies
    1. Yes, I did and there's no way to search your blog so I couldn't find it. Damn, now I'm going to have to rewrite my query.

      No, I won't make it to Madrid. There wasn't really a business or community justification to go.

      Delete
    2. « there's no way to search your blog so I couldn't find it »

      There's an index page listing all my articles: http://blog.ioguix.net/

      Plus, they are published on planet.postgresql.org.

      Then, you can follow my gists as well.

      Cheers,

      Delete
  2. Hey Josh, I think you might find it interesting. It is a comparison of different table bloat estimation queries.

    tablename | pgx_scripts | pgtoolkit | ioguix | pgstattuple
    ------------------+-------------+-----------+--------+-------------
    table7 | 81 | 81.36 | 80.93 | 80.26
    table1 | 75 | 54.93 | 74.64 | 44.70
    table5_p2 | 61 | 62.04 | 56.33 | 59.83
    table2 | 48 | 56.52 | 47.82 | 45.77 X
    table5_p1 | 34 | 34.78 | 23.95 | 32.28
    table4 | 28 | 44.00 | 44.00 | 46.39
    table5 | 17 | 16.83 | 50.91 | 50.05 X
    table3 | 0 | -4.00 | 0 | 6.42

    pgx_scripts: https://github.com/pgexperts/pgx_scripts/blob/master/administration/table_bloat_check.sql

    pgtoolkit: https://github.com/grayhemp/pgtoolkit/blob/master/misc/scratch.sql#L340-L391

    ioguix: https://gist.github.com/ioguix/4f95917f90c9e26df1b2

    the test DB DDL: https://github.com/grayhemp/pgtoolkit/blob/master/misc/scratch.sql#L5-L228

    ReplyDelete
    Replies
    1. Oh, wow, thanks! Looks like I need to do some fine-tuning with tables 4 and 5.

      Delete
    2. Yeah, I actually tried to figure out how @ioguix get table5's estimation so precise but with no success :( So, if you will have any thoughts I would be glad to hear/read them.

      Delete
    3. Hi guys,

      I moved these bloat estimation queries to a proper github repository some weeks ago. These comments here just remembered me I had to update the gist to redirect visitors to this new repo :-)

      @Sergey, as I kept fixing and nip-ticking these queries, I would be curious to know how they perform in their current version with your test case. As instance, last versions of the queries now pay attention to the fillfactor, so the result for table1 might be better now I guess..

      Thanks!

      Delete
    4. @ioguix here we are: https://gist.github.com/grayhemp/55d6b547b4f40d69ba7d

      Your result is almost perfect!

      Delete
    5. Excellent :)
      Thank you for your time and these tests Sergey!

      Delete