Thursday, May 1, 2014

New Finding Unused Indexes Query

As long as we're overhauling standard monitoring queries for PostgreSQL, here's another one.  This query helps you find indexes which are relatively unused, and as a result could probably be dropped.  I wrote a more complex (yes, really) version for our internal Performance Health Check suite, but the linked version is usable by anyone.

The query is also an example of why CTEs, otherwise known as "WITH statements", are a life-saver for working with complex queries.  I've only tested it on 9.2 and 9.3; I don't know if it'll work on older versions.

Before you use it, you need to check how long you've been collecting data into pg_stat_user_indexes and friends.  The default is since you created the database, but some people reset stats on a daily or monthly basis.  So it's important to know what you're looking at.   Don't make the mistake of dropping the indexes which are needed for the month-end reports!

The query divides seldom-used indexes into four groups:

Indexes Which Aren't Scanned At All: these indexes have no scans during the stats period.  These pretty much certainly can be dropped, except those on really small tables which you expect to grow later.

Seldom Used Indexes on Heavily Written Tables:  as a general rule, if you're not using an index twice as often as it's written to, you should probably drop it.  This query is a little more conservative than that.

Really Large, Seldom-Used Indexes: these indexes get used, but not that often, and they're taking up a lot of RAM and storage space.   Consider dropping them after some thought about why they were added.

Large Non-BTree Indexes On Busy Tables:  as a rule, non-BTree indexes like GiST and GIN don't accurately report usage stats.  As a result, we can't check how often they're used, just how big they are and if they're attached to tables which get a lot of writes.  This list of indexes should be very judiciously pruned.

Happy pruning!


  1. > CTEs, otherwise known as "WITH statements", are a life-saver for working with complex queries

    Yes. Ironically though they are the number one reason why I have so many unused indexes.

  2. How does one determine "how long you've been collecting data into pg_stat_user_indexes and friends"?

    1. Before 9.3, there really isn't a good way other than keeping track of your own administration info. Note that stats no longer reset on their own, so you'd have to send a command, or create a cron job, for them to do so.

      In 9.3 and later, many of the stats tables carry a last reset column which tells you when collection started.

  3. There should probably be some sanity checking in the math, as when I run this query on a couple of larger production my DB's I get the Division by Zero error. I'll poke around and provide an updated version once I have some spare time.

    But thanks for the premise of this Query Josh, it's great to have in the tool box.

  4. it was simple..

    SELECT 'High-Write Large Non-Btree' as reason, index_ratios.*, 4 as grp
    FROM index_ratios, all_writes
    ( case when total_writes = 0 THEN 0 ELSE (writes::NUMERIC / total_writes) END ) > 0.02
    AND NOT idx_is_btree
    AND index_bytes > 100000000
    ORDER BY grp, index_bytes DESC )

    1. Darcy, I'd thought about that, but then I thought: why would total_writes ever be 0? That means that there's been zero non-HOT updates during the stats period.

      Probably better to error-check for it though.

    2. Josh, I was getting the divide by zero error when running this on a streaming standby machine.

  5. Right, I have several large tables in the DB in question that only receive updates at most once per quarter, do it's entirely probable that they will rarely see updates during the stat period. so yes adding the check is probable the right thing.

  6. Looks like indexes on materilized views are ignored?!