Tuesday, September 23, 2014

Settings for a fast pg_restore

One thing which we do a lot for clients is moving databases from one server to another via pg_dump and pg_restore.  Since this process often occurs during a downtime, it's critical to do the pg_dump and pg_restore as quickly as possible.  Here's a few tips:
  • Use the -j multiprocess option for pg_restore (and, on 9.3, for pg_dump as well).  Ideal concurrency is generally two less than the number of cores you have, up to a limit of 8.  Users with many ( > 1000) tables will benefit from even higher levels of concurrency.
  • Doing a compressed pg_dump, copying it (with speed options), and restoring on the remote server is usually faster than piping output unless you have a very fast network.
  • If you're using binary replication, it's faster to disable it while restoring a large database, and then reclone the replicas from the new database.  Assuming there aren't other databases on the system in replication, of course.
  • You should set some postgresql.conf options for fast restore.
"What postgresql.conf options should I set for fast restore?" you ask?  Well, let me add a few caveats first:
  • The below assumes that the restored database will be the only database running on the target system; they are not safe settings for production databases.
  • It assumes that if the pg_restore fails you're going to delete the target database and start over.
  • These settings will break replication as well as PITR backup.
  • These settings will require a restart of PostgreSQL to get to production settings afterwards.
  • You will need to run ANALYZE manually after the load is done.
shared_buffers = 1/2 of what you'd usually set
maintenance_work_mem = 1GB-2GB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
checkpoint_segments = 256 or higher
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
all activity logging settings disabled

Some more notes:
  • you want to set maintenance_work_mem as high as possible, up to 2GB, for building new indexes.  However, since we're doing concurrent restore, you don't want to get carried away; your limit should be (RAM/(2*concurrency)), in order to maintain somewhat of an FS buffer.  This is a reason why you might turn concurrency down, if you have only a few large tables in the database.
  • checkpoint_segments should be set high, but requires available disk space, at the rate of 1GB per 32 segments.  This is in addition to the space you need for the database.
Have fun!

Thursday, September 18, 2014

Finding Duplicate Indexes

Recently a client asked us to help them find and weed out duplicate indexes.  We had some old queries to do this, but they tended to produce a lot of false positives, and in a database with over 2000 indexes that wasn't going to cut it.  So I rewrote those queries to make them a bit more intelligent and discriminating, and to supply more information to the user on which to base decisions about whether to drop an index.

Here's the first query, which selects only indexes which have exactly the same columns.  Let me explain the columns of output it produces:
  • schema_name, table_name, index_name: the obvious
  • index_cols: a comma-delimited list of index columns
  • indexdef: a CREATE statement for how the index was created, per pg_indexes view
  • index_scans: the number of scans on this index per pg_stat_user_indexes
Now, go run in on your own databases.  I'll wait.

So, you probably noticed that we still get some false positives, yes?  That's because an index can have all the same columns but still be different.  For example, it could use varchar_pattern_ops, GiST, or be a partial index.  However, we want to see those because often they are functionally duplicates of other indexes even though they are not exactly the same.  For example, you probably don't need both an index on ( status WHERE cancelled is null ) and on ( status ).

What about indexes which contain all of the columns of another index, plus some more?  Like if you have one index on (id, name) you probably don't need another index on just (id).  Well, here's a query to find partial matches.

This second query looks for indexes where one index contains all of the same columns as a second index, plus some more, and they both share the same first column.  While a lot of these indexes might not actually be duplicates, a lot of them will be.

Obviously, you could come up with other variations on this, for example searching for all multicolumn indexes with the same columns in a different order, or indexes with the same first two columns but others different.  To create your own variations, the key is to edit the filter criteria contained in this clause:

    FROM pg_index as ind2
    WHERE ind.indrelid = ind2.indrelid
    AND ( ind.indkey @> ind2.indkey
     OR ind.indkey <@ ind2.indkey )
    AND ind.indkey[0] = ind2.indkey[0]
    AND ind.indkey <> ind2.indkey
    AND ind.indexrelid <> ind2.indexrelid

... and change it to figure out the factors which give you the most real duplicates without missing anything.

Happy duplicate-hunting!

Tuesday, August 26, 2014

Portland schedule, revised

Just so folks know:  due to unsurmountable issues, I will not make it to DjangoCon this year.  My apologies for anyone who was planning on attending my sessions there.  However, I will be at both the Portland pgDay and FOSS4G.  See you in Portland, just later.

Monday, August 25, 2014

The great JSONB tradeoff

One thing we're waiting on to get 9.4 out the door is making a decision on JSONB storage.  Currently under debate are patches to the JSONB format which would, for users with large numbers (as in 150+) of top-level keys, cause JSONB to take up to 60% less space on disk in exchange for an up to 80% increase (that is slowdown) in response times on key extraction.  We need your feedback as potential users of the feature to make a good decision about what's best.

At stake is the question of whether the key locations should be saved as successive offsets, or as lengths.  The advantage of the former is that it speeds up extraction by making it only two lookups to locate a top-level key regardless of the number of keys you have.  The disadvantage is that a series of increasing offsets isn't very compressible.

For JSONB fields which consist of a few top-level keys and large values, this question makes no difference at all.  However, for the worst case ... 150+ top-level keys with short (under 10 bytes) values, the difference is quite dramatic.  For example, I constructed a test with 183 keys, of which 175 were NUMERIC.  I checked both table size and time to extract key #160 from 100,000 rows:

Table Size1147 MB541 MB
Extract 100K2.8s5.2s

This is a "worst case" scenario for the difference between these two designs.  Note that the extraction slowdown affects only retrieving the value to the client; it does not affect index lookups of JSONB rows, which are speedy no matter which patch is employed.

However, we're undecided on this "fix" because we don't know a couple things:

  • How likely are users to have 150+ top-level keys in one field (or keys on any single level together) with short values?
  • Is up to 60% space savings in return for up to 80% extraction slowdown a good tradeoff?  Or a bad one?
As such, I am turning to you, the users, to help us decide what tradeoff makes sense for 9.4.  Let me know in the comments.

Thursday, August 14, 2014

More on 9.4 percentiles

Some of you may be following the discussion on JSONB compression on -hackers, which is the remaining major issue for 9.4.  As part of this, I needed to run some statistics on average column sizes for JSON vs JSONB tables. This makes a good example of how incredibly useful WITHIN GROUP will be in 9.4.

First, I created two versions of the data, one called "jsonic" which has text-JSON data, and one called "jsonbish" which has the JSONB version.  Then I ran some comparisons using pg_column_size on the JSON columns.

with colsizes_b as (
    select pg_column_size(jsonbish.jcol) as colsize
    from jsonbish
distrib_b as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_b
colsizes_j as (
    select pg_column_size(jsonic.jcol) as colsize
    from jsonic
distrib_j as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_j
select 'json' as thetype,
    colsize_distribution from distrib_j
union all
select 'jsonb',
    from distrib_b;

So I'm taking the column size of each row, then sorting them by size, and then doing a percentile distribution using the new WITHIN GROUP.  There's no group there because the group is actually the whole table.  I've chosen the usual box plot percentages: minimum, 25%, median, 75%, and maximum.

And the results:

 thetype |    colsize_distribution   
 json    | {1741,1767,1854,1904,2292}
 jsonb   | {3551,5866,5910,5958,6168}

... which demonstrates that we do indeed have a major issue with compressability.

Tuesday, August 12, 2014

SFPUG Video on YouTube

TL;DR: SFPUG video available on YouTube, including streaming video tonight.  I need help converting old videos and making new ones.

First, we have a few SFPUG videos available on YouTube:
While JSONB Deep Dive was also recorded, there were issues with the audio, so I don't recommend watching it, sorry.

We will have streaming video for tonight's SFPUG, which will be on Postgres-XL.   Join the Google Event to get a link to the streaming video once it starts; expected start time is 7:15PM PST.

This brings me to the third thing, which is that I could use some help with video, in two areas:
  1. I desperately need someone to help take better-quality videos of SFPUG meetings, with better focus and sound.  Right now I'm using a Logitech webcam, and it's just not cutting it.  Video will need to be compatible with Google Hangouts, unless you have another plan for broadcasting.
  2. I have 7 or 8 videos of SFPUG meetings from 2013, including Tom Lane explaining the query planner, in FLV format.  I need someone to transcode these to a YouTube format and upload them.
If you can help, email me.  Thanks!

Thursday, August 7, 2014

Call for quotes for the 9.4 release announcement

We are looking for endorsement quotes for the 9.4 release.
Specifically, we want quotes from end-users or ISVs* who have not been
featured for a previous release, and enthuse about one of the following
  • Streaming Changesets/Bi-Directional Replication
  • Performance improvements in 9.4
  • Something else about Flexibility, Scalability or Performance
We need these quotes by August 15th.  Contact me if you want to offer a quote.  Thanks!

(* while we love our PostgreSQL consulting and support companies,
endorsements from them aren't news in a PR sense)