The Need To Freeze
Every write transaction or standalone statement in PostgreSQL gets assigned a unique transaction ID (or XID). This XID determines the precedence visibility, determining which other concurrent users can see row changes associated with the transaction. Read-only transactions do not get assigned an XID. These XIDs get stamped on each row in storage, in the form of xmin and xmax, denoting the minimum and maximum concurrent transactions which can "see" the row. By putting these in the row headers, we decentralize transaction conflict resolution, which supports PostgreSQL's very high concurrency capabilities.
However, there's a problem with XIDs: they're a four-byte integer. This means that after 2.1 billion transactions, they need to wrap around. And by the time the XID counter wraps around, the old XIDs need to be gone from all of the old row headers. This means removing the XID from xmin and xmax and replacing it with the special RelFrozenXID value, indicating a transaction which is so old everyone can see it.
Now, 2.1 billion transactions seems like a lot, but it's not really. If you're doing just 100 write transactions per second (average), you'll go through that in about 10 months. This means that we need to be constantly cleaning old data pages of old XIDs to be ready for wraparound. And, indeed, PostgreSQL is prepared to do this, through the autovacuum facility.
The first autovacuum setting is autovacuum_freeze_max_age, which is an absolute ceiling on how long autovacuum will let you go before it kicks in and starts exhaustively vacuum freezing the old XIDs on your tables with old rows. Hitting this threshold isn't a great thing, because it can generate lots of IO at unexpected times, and freeze autovacuums aren't cancellable (if you cancel them, they come right back).
You can see how close you are to autovacuum freeze anywhere with this query:
select max(age(datfrozenxid)) from pg_database;
However, you don't generally really care about that; what you care about is "how close am I to forced autovacuum freeze on a large table?" Here's that:
SELECT relname, age(relfrozenxid) as xid_age,
pg_size_pretty(pg_table_size(oid)) as table_size
WHERE relkind = 'r' and pg_table_size(oid) > 1073741824
ORDER BY age(relfrozenxid) DESC LIMIT 20;
More specifically, that says "give me the top 20 tables over 1GB, sorted by the age of their oldest XID". Results look something like this:
relname | xid_age | table_size
postgres_log | 199785216 | 12 GB
statements | 4551790 | 1271 MB
normal_statement_times | 31 | 12 GB
Now, that above is trouble brewing. We see that postgres_log is at an XID age of 199 million, which is awfully close to the default autovacuum_freeze_max_age of 200 million. And having autovac decide to read and rewrite an entire 12GB table, possibly at peak usage times, can cause unexpected poor application performance.
One thing we can do is raise autovacuum_freeze_max_age. The default, 200m, is very conservative; it's only 10% of our shut-down-the-database threshold. On high-transaction-volume databases, I generally raise it to 1 billion, which is still only the 50% mark.
However, that's just putting off the problem. What we should do is manually freeze the table at at time which is convenient for the application, i.e. during a low activity period or a scheduled downtime. Then we run:
VACUUM FREEZE postgres_log;
Even better, we'd like to prevent ourselves from getting that close to the threshold in the first place. How do we do that? Well, there's some other tuning parameters.
To be continued in Part 2 ...