Database Soup: New New Index Bloat Query

Tuesday, April 1, 2014

New New Index Bloat Query

Earlier this week Ioguix posted an excellent overhaul of the well-known Index Bloat Estimation from check_postgres. However, I felt that we needed several additional changes before the query is ready for me to use in our internal monitoring utilities, and thought I'd post our version here.

Here's the New New Index Bloat Query.

Here's what I changed:

Had it pull btree indexes only, because the calculations don't work for GIN/GiST indexes.
Moved all of the nested subqueries up into WITH clauses for easier readability and maitainability. Since all supported Postgres versions now do CTEs, I didn't see any reason to work around them.
Removed the \set statements, because they don't work for embedding the query in a script. And they only need setting dynamically if you've compiled PostgreSQL with nonstandard options, anyway.
Removed a couple of CASE statements aimed at 7.X support; who cares?
Added some extra informational columns for using this in interactive mode: table size, pretty index size, and index scans. This helps folks figure out whether to rebuild an index, ignore the bloat or to drop it.
In the example query, filtering down to indexes with bloat over 50% and 50MB, which is our threshold for "significant bloat"

Example output:

-[ RECORD 27 ]+-----------------------------
database_name | prod-web
schema_name   | partitions
table_name    | transactions_201308
index_name    | transactions_201308_uuid_idx
bloat_pct     | 52.9
bloat_bytes   | 351649792
bloat_size    | 335 MB
index_bytes   | 664788992
index_size    | 634 MB
table_bytes   | 4570447872
table_size    | 4359 MB
index_scans   | 263

So this index would be a good candidate for deletion, since it's 50% bloated and seldom scanned.

Now, there's still more to do on this. We need a similar query for GIN/GiST/SPGiST. The query could use some more cleanup; removing one-letter table aliases, and GROUP BY 1,2,3,4 stuff. But it's vastly improved for checking which of your indexes need VACUUM/REINDEX. Thanks, Ioguix!

17 comments:

Bernd HelmleApril 2, 2014 at 5:51 AM
Very interesting, but one side note: the old query uses pg_stats and thus enables check_postgres to query the statistics without being a superuser for example.
ReplyDelete
Replies
Greg Sabino MullaneApril 2, 2014 at 11:16 AM
Patches welcome, Josh, but plenty of users *do* care about 7.x and 8.x, so we cannot support WITH quite yet. :)
ReplyDelete
Replies
ioguixApril 3, 2014 at 1:44 AM
Thanks for improvements Josh !

1. I hadn't time to work or test on GiST/GIN, and it probably doesn't work properly anyway. +1 for limiting explicitly to B-Tree
2. and 4. as Greg explained, I care about 7.x and 8.x as well, so I'll keep the sub-query version
3. \set was actually to make the query more "readable" and understandable. Obviously, it's easy to serialize them back in query for tools :-)

WARNING: You directly use value "2" in CTE "index_aligned" for "nulldatahdrwidth" instead of "index_tuple_hdr" computed in CTE "index_item_sizes". So your stats are not good enough with index referencing null-able fields. I answered you about this on my blog.

Cheers,
ReplyDelete
Replies
grayhempApril 4, 2014 at 11:54 AM
What about comparison with pgstattuple? How different the results are?
ReplyDelete
Replies
buddyglassJune 9, 2015 at 8:23 AM
Other comment hasn't been approved so I can't post this as a reply to it, but here's the table definition:

Table "public.transfers"
Column | Type | Modifiers
----------------------------+-----------------------------+--------------------------------------------------------
id | integer | not null default nextval('transfers_id_seq'::regclass)
edited | integer |
external_id_organization | character varying(255) |
accountnumber_from | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
edited | character varying(255) |
status | character varying(255) |
bj_job_id | integer |
edited | timestamp without time zone |
edited | timestamp without time zone |
edited | date |
edited | character varying(255) |
edited | integer |
edited | integer |
edited | integer |
edited | integer |
edited | integer |
edited | boolean | default true
edited | character varying(255) |
schedule_id | character varying(255) |
edited | character varying(255) |
edited | integer |
edited | timestamp without time zone |
edited | timestamp without time zone |
edited | character varying(255) |
edited | numeric | not null
Indexes:
"transfers_pkey" PRIMARY KEY, btree (id)
"index_transfers_on_schedule_id" UNIQUE, btree (schedule_id) WHERE schedule_id IS NOT NULL
"index_transfers_on_bj_job_id" btree (bj_job_id) WHERE bj_job_id IS NOT NULL
"index_transfers_on_external_id_organization_accountnumber_from" btree (external_id_organization, accountnumber_from)
"index_transfers_on_success_by_organization" btree (external_id_organization) WHERE transfer_status_indicates_success(status)
Triggers:
transfers_set_amount_as_decimal_trigger BEFORE INSERT OR UPDATE ON transfers FOR EACH ROW EXECUTE PROCEDURE set_amount_as_decimal()
ReplyDelete
Replies
buddyglassJune 9, 2015 at 8:24 AM
...and some more info on the ID column and bloat query output:

production=# select count(1), min(id), max(id) from transfers;
count | min | max
---------+-----+---------
4453403 | 163 | 4534588

production=# \i bloat.sql
database_name | schema_name | table_name | index_name | bloat_pct | bloat_bytes | bloat_size | index_bytes | index_size | table_bytes | table_size | index_scans
-----------------+-------------+------------+----------------------------------------------------------------+-----------+-------------+------------+-------------+------------+-------------+------------+-------------
production | public | transfers | index_transfers_on_bj_job_id | 63.3 | 720896 | 704 kB | 1138688 | 1112 kB | 2030804992 | 1937 MB | 6
production | public | transfers | index_transfers_on_external_id_organization_accountnumber_from | 6.6 | 16277504 | 16 MB | 248111104 | 237 MB | 2030804992 | 1937 MB | 212895
production | public | transfers | index_transfers_on_schedule_id | 0.0 | 0 | 0 bytes | 16384 | 16 kB | 2030804992 | 1937 MB | 17
production | public | transfers | index_transfers_on_success_by_organization | 19.4 | 21061632 | 20 MB | 108298240 | 103 MB | 2030804992 | 1937 MB | 6451
production | public | transfers | transfers_pkey | 69.4 | 200253440 | 191 MB | 288636928 | 275 MB | 2030804992 | 1937 MB | 36690
ReplyDelete
Replies

Add comment