One of the things which usually happens on the rare occasions when you have PostgreSQL database corruption is that indexes become corrupted so that index entries no longer match the underlying table. For regular indexes, this can be fixed with a REINDEX or pg_repack. But it becomes a real problem when the index was used for a primary key, unique index or foreign key, because then duplicate or broken data gets inserted into the tables, and attempts to rebuild the index will fail.
Database-wide, the easiest way to find these broken indexes is to do a pg_dump and pg_restore of your whole database; data which violates keys will show up as errors on the pg_restore.
The second thing we need to do is find the duplicate or missing data so that we can "clean" it and then rebuild the index. First, we'll need to disable index access to make sure that we don't hit the index and get erroneous information back:
SET enable_indexscan = off;
SET enable_bitmapscan = off;
SET enable_indexonlyscan = off; -- 9.2+ only!
So for our first SQL trick, we want to query a table with a simple SERIAL surrogate key and find duplicate IDs. Then, for each of these IDs, we want to get the entire referenced row so we can look at it and decide how to fix the data. One way to do that is using a WITH clause and an aggregate:
WITH dups AS (
SELECT ticketid, count(*)
GROUP BY ticketid
HAVING count(*) > 1 )
JOIN dups USING ( ticketid )
ORDER BY ticketid;
And that gives us a list of duplicate ticketids. From there, we can choose to delete rows, assign new IDs, or do something else, depending on the nature of the duplication. Of course, if you never created any real keys (in addition to the surrogate key), you're going to have a devil of a time figuring things out.
However, the above query format isn't the only way we can search for duplicate IDs. We can also use a windowing query, which is often easier to type if we have multicolumn unique constraints:
WITH dupcount AS (
count(*) OVER (
PARTITION BY post, commentor, commented_at)
FROM comments )
SELECT * FROM dupcount
WHERE entrycount > 1
ORDER BY post, commentor, commented_at;
Finding broken foreign keys is a bit more straightforwards. In this case, you just want to use an outer join to find the missing parent records:
LEFT OUTER JOIN shoppingcart
WHERE shoppingcart.shoppingcart_id IS NULL
ORDER BY shoppingcart_item.shoppingcart_id;
Once you've found those, though, you can't do much more than dump them to a file and delete the records, since there's no way to find out what the missing parent record data was.