Database Soup: maintenance

Showing posts with label maintenance. Show all posts

Tuesday, October 21, 2014

Introducing Flexible Freeze

One of the things I mentioned in my series on VACUUM FREEZE was that we really needed a Postgres utility which would opportunistically freeze tables during low traffic periods. Today I'm announcing the Flexible Freeze project, our first attempt at designing such a utility.

All that's there right now is a simple Python script. However, that script is already a useful tool, installed at multiple production sites. Here's how the script works:

identify your active databases and daily/weekly low traffic periods.
create a cron job which calls flexible_freeze.py with a time limit to keep it inside your low traffic window.
flexible_freeze.py will loop through your tables with the oldest XIDs, freezing them until it runs out of time or out of tables

There is also a second mode, using the --vacuum switch, which does VACUUM ANALYZE on the tables with the most dead rows (according to pg_stat_user_tables). This is to help users who have a strong high/low traffic cycle and want to make sure that regular vacuuming takes place during low traffic. If you're running both modes, we advise doing the freeze first.

Of course, I have a tanker-truck full of desired improvements/expansions to this. So, pull requests welcome.

If you're more into Ruby, Wanelo has rewritten flexible freeze for Rails and incorporated it into their Postmodern tool.

Tuesday, April 1, 2014

New New Index Bloat Query

Earlier this week Ioguix posted an excellent overhaul of the well-known Index Bloat Estimation from check_postgres. However, I felt that we needed several additional changes before the query is ready for me to use in our internal monitoring utilities, and thought I'd post our version here.

Here's the New New Index Bloat Query.

Here's what I changed:

Had it pull btree indexes only, because the calculations don't work for GIN/GiST indexes.
Moved all of the nested subqueries up into WITH clauses for easier readability and maitainability. Since all supported Postgres versions now do CTEs, I didn't see any reason to work around them.
Removed the \set statements, because they don't work for embedding the query in a script. And they only need setting dynamically if you've compiled PostgreSQL with nonstandard options, anyway.
Removed a couple of CASE statements aimed at 7.X support; who cares?
Added some extra informational columns for using this in interactive mode: table size, pretty index size, and index scans. This helps folks figure out whether to rebuild an index, ignore the bloat or to drop it.
In the example query, filtering down to indexes with bloat over 50% and 50MB, which is our threshold for "significant bloat"

Example output:

-[ RECORD 27 ]+-----------------------------
database_name | prod-web
schema_name   | partitions
table_name    | transactions_201308
index_name    | transactions_201308_uuid_idx
bloat_pct     | 52.9
bloat_bytes   | 351649792
bloat_size    | 335 MB
index_bytes   | 664788992
index_size    | 634 MB
table_bytes   | 4570447872
table_size    | 4359 MB
index_scans   | 263

So this index would be a good candidate for deletion, since it's 50% bloated and seldom scanned.

Now, there's still more to do on this. We need a similar query for GIN/GiST/SPGiST. The query could use some more cleanup; removing one-letter table aliases, and GROUP BY 1,2,3,4 stuff. But it's vastly improved for checking which of your indexes need VACUUM/REINDEX. Thanks, Ioguix!

Friday, December 7, 2012

Freezing Your Tuples Off, Part 3

First, read Part 1 and Part 2 of this series.

vacuum_freeze_table_age

To understand the purpose of this parameter, you have to understand how vacuum has worked since version 8.4. Vacuum no longer usually scans the entire table; instead, it has a bitmap (called the visibility_map) which tells it which data pages in the table have reclaimable space, so that it scans only the portion of the table which is "dirty". This is a considerable optimization for large tables, where only 20% of the rows might have been updated or deleted since the last vacuum cycle.

This does, however, introduce a problem. Since old, untouched data pages aren't being vacuumed (since they don't have any reclaimable rows anymore), they are not getting frozen regardless of the setting of vacuum_freeze_min_age. This makes it much more likely you'll hit an eventual wraparound vac when you least expect it.

The idea of vacuum_freeze_table_age is to compliment autovacuum_max_freeze_age by telling the database: "If it's almost vacuum_freeze time for this table, and you were vacuuming the table anyway, then scan the whole table and to freeze out tuples." In theory, this should allow you to hold off wraparound vacuum by running a regular, cancellable vacuum instead. In practice, though, that's a dubious benefit for large tables, since either way you get a full-table scan and a vacuum which runs for a long time. Small, heavily updated tables will tend to get most of their pages vacuumed most of the time anyway, and thus be unaffected by this setting.

As such, there's no point in setting it to a particularly low or creative level. Simply set it to 80% of whatever autovacuum_max_freeze_age is set to (recommended: 800 million).

Flexible Freeze

By now it should have occurred to you that Postgres has a serious problem with vacuum freeze behavior on large tables. It's the unfortunate bad side-effect of optimizations in several other areas. You can tweak the settings as I recommended, but you'll still be in danger of having a wraparound vacuum kick in and saturate your IO at some unexpected time. So what can you do to behave better and proactively freeze tables before they get to that point?

Well, you have one piece of information PostgreSQL doesn't. You know when the slow periods in your application usage are. Try following this "flexible freeze" program in a cron script:
During expected "slow periods", run the query from Part I to get a list of tables, and then:

Set PostgreSQL to "soft vacuum" with a high vacuum_cost_delay (such as 50ms).
Set PostgreSQL to aggressively freeze tuples, with vacuum_freeze_table_age at 50% of autovacuum_freeze_max_age, and vacuum_freeze_min_age set to 10% of its usual value.
Until the slow period is almost over, loop through the list, vacuuming each table.
This will help you avoid wraparound vacuum when you least expect it.

Or, you can use our Flexible Freeze python script.

Further Work

Obviously the suggestions in this article are workarounds. Where PostgreSQL really needs to go is to find some way to avoid needing to vacuum old, cold data ever. The obstacle to doing this is that nobody has figured out how.

One of the first methods suggested was to have an 8-byte XID, which would postpone wraparound by a billion cycles. However, since there are two XIDs on every row header in the database, this would dramatically increase database sizes for many users, especially the users who need more XIDs in the first place. It would also increase memory requirements for a lot of Postgres operations. A second method I suggested during the 9.2 cycle was to have an "XID cycle" counter in the page header of each data page. This would have the advantage of not increasing the size of rows. However, it would have the drawback of failing in cases where there were rows from more than one XID cycle in the page.

Further, both of these methods hit a major performance problem: the CLOG. PostgreSQL's CLOG tracks which XIDs committed and which rolledback, and thus which rows are valid and which are orphaned. Currently, checking the CLOG is quite rapid because it's kept small. Any patch which involves increasing the size of XIDs or keeping more of them will cause the CLOG to bloat by multiples, substantially affecting response times on transaction-processing workloads. That's unacceptable. We've discussed radically restructuring the CLOG, but that's a major enough change that it would need to come with other benefits than just avoiding freeze.

An alternate approach to never freezing old data at all would be to develop some kind of daemon which did background freezes of chunks of large tables. This daemon would need to be able to tell when the system was "busy" and stop work. It would also need to be able to track what old data pages it had visited and which it hadn't, and which ones need to be revisited because they've been dirtied.

Anyone wanting to take up the above -- either by coding or by funding -- let me know! Until then, happy freezing!

Friday, September 28, 2012

Freezing Your Tuples Off, Part 1

Given that it's currently around 50F, damp and foggy in San Francisco (so much for Indian Summer), my thoughts turn to freezing. Specifically, freezing transaction IDs. There's three settings associated with this behavior in postgresql.conf, which I'll be discussing below:

autovacuum_freeze_max_age
vacuum_freeze_min_age
vacuum_freeze_table_age

At the least, you'll need to understand freezing and what these options do. Many users will also want to change them away from the default values.

The Need To Freeze

Every write transaction or standalone statement in PostgreSQL gets assigned a unique transaction ID (or XID). This XID determines the precedence visibility, determining which other concurrent users can see row changes associated with the transaction. Read-only transactions do not get assigned an XID. These XIDs get stamped on each row in storage, in the form of xmin and xmax, denoting the minimum and maximum concurrent transactions which can "see" the row. By putting these in the row headers, we decentralize transaction conflict resolution, which supports PostgreSQL's very high concurrency capabilities.

However, there's a problem with XIDs: they're a four-byte integer. This means that after 2.1 billion transactions, they need to wrap around. And by the time the XID counter wraps around, the old XIDs need to be gone from all of the old row headers. This means removing the XID from xmin and xmax and replacing it with the special RelFrozenXID value, indicating a transaction which is so old everyone can see it.

Now, 2.1 billion transactions seems like a lot, but it's not really. If you're doing just 100 write transactions per second (average), you'll go through that in about 10 months. This means that we need to be constantly cleaning old data pages of old XIDs to be ready for wraparound. And, indeed, PostgreSQL is prepared to do this, through the autovacuum facility.

autovacuum_freeze_max_age

The first autovacuum setting is autovacuum_freeze_max_age, which is an absolute ceiling on how long autovacuum will let you go before it kicks in and starts exhaustively vacuum freezing the old XIDs on your tables with old rows. Hitting this threshold isn't a great thing, because it can generate lots of IO at unexpected times, and freeze autovacuums aren't cancellable (if you cancel them, they come right back).

You can see how close you are to autovacuum freeze anywhere with this query:

select max(age(datfrozenxid)) from pg_database;

However, you don't generally really care about that; what you care about is "how close am I to forced autovacuum freeze on a large table?" Here's that:

SELECT relname, age(relfrozenxid) as xid_age,
   pg_size_pretty(pg_table_size(oid)) as table_size
FROM pg_class
WHERE relkind = 'r' and pg_table_size(oid) > 1073741824
ORDER BY age(relfrozenxid) DESC LIMIT 20;

More specifically, that says "give me the top 20 tables over 1GB, sorted by the age of their oldest XID". Results look something like this:

        relname         | xid_age | table_size
------------------------+-----------+------------
postgres_log           | 199785216 | 12 GB
statements             |   4551790 | 1271 MB
normal_statement_times |        31 | 12 GB

Now, that above is trouble brewing. We see that postgres_log is at an XID age of 199 million, which is awfully close to the default autovacuum_freeze_max_age of 200 million. And having autovac decide to read and rewrite an entire 12GB table, possibly at peak usage times, can cause unexpected poor application performance.

One thing we can do is raise autovacuum_freeze_max_age. The default, 200m, is very conservative; it's only 10% of our shut-down-the-database threshold. On high-transaction-volume databases, I generally raise it to 1 billion, which is still only the 50% mark.

However, that's just putting off the problem. What we should do is manually freeze the table at at time which is convenient for the application, i.e. during a low activity period or a scheduled downtime. Then we run:

VACUUM FREEZE postgres_log;

Even better, we'd like to prevent ourselves from getting that close to the threshold in the first place. How do we do that? Well, there's some other tuning parameters.

To be continued in Part 2 ...