Tuesday, September 23, 2014

Settings for a fast pg_restore

One thing which we do a lot for clients is moving databases from one server to another via pg_dump and pg_restore.  Since this process often occurs during a downtime, it's critical to do the pg_dump and pg_restore as quickly as possible.  Here's a few tips:
  • Use the -j multiprocess option for pg_restore (and, on 9.3, for pg_dump as well).  Ideal concurrency is generally two less than the number of cores you have, up to a limit of 8.  Users with many ( > 1000) tables will benefit from even higher levels of concurrency.
  • Doing a compressed pg_dump, copying it (with speed options), and restoring on the remote server is usually faster than piping output unless you have a very fast network.
  • If you're using binary replication, it's faster to disable it while restoring a large database, and then reclone the replicas from the new database.  Assuming there aren't other databases on the system in replication, of course.
  • You should set some postgresql.conf options for fast restore.
"What postgresql.conf options should I set for fast restore?" you ask?  Well, let me add a few caveats first:
  • The below assumes that the restored database will be the only database running on the target system; they are not safe settings for production databases.
  • It assumes that if the pg_restore fails you're going to delete the target database and start over.
  • These settings will break replication as well as PITR backup.
  • These settings will require a restart of PostgreSQL to get to production settings afterwards.
  • You will need to run ANALYZE manually after the load is done.
shared_buffers = 1/2 of what you'd usually set
maintenance_work_mem = 1GB-2GB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
checkpoint_segments = 256 or higher
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
all activity logging settings disabled

Some more notes:
  • you want to set maintenance_work_mem as high as possible, up to 2GB, for building new indexes.  However, since we're doing concurrent restore, you don't want to get carried away; your limit should be (RAM/(2*concurrency)), in order to maintain somewhat of an FS buffer.  This is a reason why you might turn concurrency down, if you have only a few large tables in the database.
  • checkpoint_segments should be set high, but requires available disk space, at the rate of 1GB per 32 segments.  This is in addition to the space you need for the database.
Have fun!



6 comments:

  1. I have multiples databases and want to create just one database with multiple schemas. I use pg_dump and pg_restore and I added in the setting --jobs=24 since I have 24 cores. Each database is about 6G and I have 100 shards. Each shard takes about 8 minutes doing pg_dupm and pg_restpre after reading your article. Before it took 9.5 each. However, I was wondering if there is another setting of way to speed it up this process? It will be great if it takes 1 min per database. Can londiste be a solution?
    Thoughts?

    ReplyDelete
  2. I was able to solve this issue creating multiples workers, and it took one hour and a half to migrate 1.6T.

    ReplyDelete
  3. I have used all your tips and it worked perfectly.
    But I am afraid that after PITR changes(last tip about wal and archive), my replication will not work.
    Can you say for sure, if after restore I will change to my default options so everything will work?

    ReplyDelete
    Replies
    1. You will need to take a new snapshot of the database and start PITR as new.

      Delete
  4. Hey Josh,

    What's the advantage of reducing shared buffers by half?

    ReplyDelete
    Replies
    1. It's to make more main memory available for caching data and writes to the filesystem.

      Delete