Database Soup: New Table Bloat Query

Friday, October 10, 2014

New Table Bloat Query

To accompany the New Index Bloat Query, I've written a New Table Bloat Query. This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the check_postgres.pl version in several ways:

Rewritten to use WITH statements for better maintainability and clarity
Conditional logic for old Postgres versions and 32-bit platforms taken out
Index bloat removed, since we have a separate query for that
Columns formatted to be more immediately comprehensible

In the course of building this, I found two fundamentally hard issues:

Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
There's no good way to estimate bloat for compressed (TOAST) attributes and rows.

Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size. Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial. I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
 --------------+------------+-------------------+-----------+----------+----------  
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188

pct_bloat is how much of the table (0 to 100) is estimated to be dead space. MB_bloat is how many megabytes of bloat are estimated to exist. Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB. However, you should calibrate this according to your own database.

9 comments:

ioguixOctober 14, 2014 at 4:22 AM
Hello Josh,

You might have miss my last blog post about table bloat:

http://blog.ioguix.net/postgresql/2014/09/10/Bloat-estimation-for-tables.html

«I'm not clear on where that error is coming from, so help improving the math is very welcome!»
I wrote about bloat coming from type alignment there. Maybe it takes part on the +/- 20% error you have with your query ? Note that my query is doing some rough estimation on toasted data as well.

See you at pgconf.eu un spain next week ?

Cheers,
ReplyDelete
Replies
grayhempMarch 10, 2015 at 2:56 PM
Hey Josh, I think you might find it interesting. It is a comparison of different table bloat estimation queries.

tablename | pgx_scripts | pgtoolkit | ioguix | pgstattuple
------------------+-------------+-----------+--------+-------------
table7 | 81 | 81.36 | 80.93 | 80.26
table1 | 75 | 54.93 | 74.64 | 44.70
table5_p2 | 61 | 62.04 | 56.33 | 59.83
table2 | 48 | 56.52 | 47.82 | 45.77 X
table5_p1 | 34 | 34.78 | 23.95 | 32.28
table4 | 28 | 44.00 | 44.00 | 46.39
table5 | 17 | 16.83 | 50.91 | 50.05 X
table3 | 0 | -4.00 | 0 | 6.42

pgx_scripts: https://github.com/pgexperts/pgx_scripts/blob/master/administration/table_bloat_check.sql

pgtoolkit: https://github.com/grayhemp/pgtoolkit/blob/master/misc/scratch.sql#L340-L391

ioguix: https://gist.github.com/ioguix/4f95917f90c9e26df1b2

the test DB DDL: https://github.com/grayhemp/pgtoolkit/blob/master/misc/scratch.sql#L5-L228
ReplyDelete
Replies

Add comment