Here's the first query, which selects only indexes which have exactly the same columns. Let me explain the columns of output it produces:
- schema_name, table_name, index_name: the obvious
- index_cols: a comma-delimited list of index columns
- indexdef: a CREATE statement for how the index was created, per pg_indexes view
- index_scans: the number of scans on this index per pg_stat_user_indexes
So, you probably noticed that we still get some false positives, yes? That's because an index can have all the same columns but still be different. For example, it could use varchar_pattern_ops, GiST, or be a partial index. However, we want to see those because often they are functionally duplicates of other indexes even though they are not exactly the same. For example, you probably don't need both an index on ( status WHERE cancelled is null ) and on ( status ).
What about indexes which contain all of the columns of another index, plus some more? Like if you have one index on (id, name) you probably don't need another index on just (id). Well, here's a query to find partial matches.
This second query looks for indexes where one index contains all of the same columns as a second index, plus some more, and they both share the same first column. While a lot of these indexes might not actually be duplicates, a lot of them will be.
Obviously, you could come up with other variations on this, for example searching for all multicolumn indexes with the same columns in a different order, or indexes with the same first two columns but others different. To create your own variations, the key is to edit the filter criteria contained in this clause:
WHERE EXISTS ( SELECT 1
FROM pg_index as ind2
WHERE ind.indrelid = ind2.indrelid
AND ( ind.indkey @> ind2.indkey
OR ind.indkey <@ ind2.indkey )
AND ind.indkey = ind2.indkey
AND ind.indkey <> ind2.indkey
AND ind.indexrelid <> ind2.indexrelid
... and change it to figure out the factors which give you the most real duplicates without missing anything.