Wednesday, November 19, 2014

Good kernel, bad kernel

A month ago I got into an argument on IRC with Sergey about telling people to avoid kernel 3.2.  This turned out to be a very productive argument, because Sergey then went and did a battery of performance tests against various Linux kernels on Ubuntu. Go read it now, I'll wait.

My takeaways from this:

  • Kernel 3.2 is in fact lethally bad.
  • Kernel 3.13 is the best out of kernel 3.X so far.  I hope that this can be credited to the PostgreSQL team's work with the LFS/MM group.
  • No 3.X kernel yet has quite the throughput of 2.6.32, at least at moderate memory sizes and core counts.
  • However, kernel 3.13 has substantially lower write volumes at almost the same throughput.  This means that if you are write-bound on IO, 3.13 will improve your performance considerably.
  • If your database is mostly-reads and highly concurrent, consider enabling
    kernel.sched_autogroup_enabled.
Thanks a lot to Sergey for doing this testing, and thanks even more to the LFS/MM group for improving IO performance so much in 3.13.

Tuesday, November 18, 2014

pgDay SF 2015 Call For Speakers/Sponsors now open

pgDay SF 2015, to be held March 10th in Burlingame, CA, now has its Call for Speakers open.  The event is also looking for sponsors.  Come join us and meet even more Postgres folks!

Thursday, November 6, 2014

We need a webapp benchmark

I've been doing some comparative testing on different cloud platform's hosting of PostgreSQL.  And one of the deficiencies in this effort is the only benchmarking tool I have is pgbench, which doesn't reflect the kinds of workloads people would want to run on cloud hosting.  Don't get me wrong, pgbench does everything you could imagine with the simple Wisconsin benchmark, including statistics and sampling.   But the core benchmark is still something which doesn't look much like the kind of Rails and Django apps I deal with on a daily basis.

There's also TPCC-js, which is more sophisticated, but is ultimately still a transactional, back-office OLTP benchmark. 

So I'm thinking of developing a "webapp" benchmark.  Here's what I see as concepts for such a benchmark:
  • Read-mostly
  • No multi-statement transactions
  • Defined "users" concept with logins and new user registration
  • Needs a "sessions" table which is frequently updated
  • Read-write, read-only and session database connections should be separable, in order to test load-balancing optimization.
  • Queries counting, sorting and modifying content
  • Measured unit of work is the "user session" which would contain some content lookups and minor updates ("likes").
Now, one of the big questions is whether we should base this benchmark on the idea of a social networking (SN) site.  I think we should; SN sites test a number of things, including locking and joins which might not be exercised by other types of applications (and aren't by pgbench).  What do you think?  Does anyone other than me want to work on this?

Saturday, November 1, 2014

Finding Foreign Keys with No Indexes

Unlike some other SQL databases, PostgreSQL does not automatically create indexes on the "child" (in formal language "referencing") side of a foreign key.  There are some good reasons for this (see below), but it does give users another opportunity to forget something, which is indexing the foreign keys (FKs) that need it.  But which ones need it?  Well, fortunately, you can interrogate the system catalogs and find out.

I have added a query for finding foreign keys without indexes to pgx_scripts.  Indexes on the "parent", or "referenced" side of the FK are automatically indexed (they have to be, because they need to be unique), so we won't talk about them further.

Now, in order to understand how to interpret this, you have to understand why you would or would not have an index on an FK, and what sort of indexes are valid.  There's two times that indexes on the child side of FKs are used:
  • when doing JOIN and lookup queries using the FK column
  • when updating or deleting a row from the "parent" table
The second occasion is news to some DBAs.  The way it works is this: before letting you delete or change a row in the "parent" table, Postgres has to verify that there are no rows in the "child" table referencing the FK value that might be going away.  If there are, it needs to perform the action you have defined (such as CASCADE, SET NULL or RESTRICT).  If the "child" table is large, this can be substantially speeded up by having an index on the FK column.

This means that it's important to have an index on the child side of the FK if any of the following are true:
  • The child table is large and the parent table gets updates/deletes
  • The parent table is large and the FK is used for JOINs
  • The child table is large and the FK is used to filter (WHERE clause) records on the child table
This means most FKs, but not all of them.  If both tables are small, or if the parent table is small and the FK is used only to prevent bad data entry, then there's no reason to index it.  Also, if the FK is very low cardinality (like, say, only four possible values) then it's probably also a waste of resources to index it.

Now you're ready to run the query on your own database and look at the results. The query tries to filter for the best indexing candidates, but it is just a query and you need to use your judgement on what you know about the tables.  The query also filters for either the parent or child table being larger than 10MB.

Now, you might say "but I have an index on column Y" and wonder why it's appearing on the report.   That's probably because the FK does match the first columns of the index.  For example, an index on ( name, team_id ) cannot be used for an FK on team_id.

You may notice a 2nd section of the report called "questionable indexes".  These are FKs which have an index available, but that index may not be usable for JOINs and constraint enforcement, or may be very inefficient.  This includes:
  • Non-BTree indexes.  Currently other types of indexes can't be used for FK enforcement, although that is likely to change in future Postgres versions.  But they can sometimes be used for joins.
  • Indexes with more than one column in addition to the FK columns.  These indexes can be used for FKs, but they may be very inefficient at it due to the bigger size and extra index levels.
  • Partial indexes (i.e. INDEX ... WHERE).  In general, these cannot be used for FK enforcement, but they can sometimes be used for joins.
It's not quite as clear when you want to add to, or replace those indexes.  My advice is to start with the missing indexes and then move on to the more nuanced cases.




Thursday, October 30, 2014

Upcoming Seattle Visit

I will be in Seattle soon on business.  This will include two opportunties to do PostgreSQL community stuff:
If you're in the Seattle area, come chat!  We'll go out to Dilletante and have chocolate!

Tuesday, October 21, 2014

Introducing Flexible Freeze

One of the things I mentioned in my series on VACUUM FREEZE was that we really needed a Postgres utility which would opportunistically freeze tables during low traffic periods. Today I'm announcing the Flexible Freeze project, our first attempt at designing such a utility.

All that's there right now is a simple Python script.  However, that script is already a useful tool, installed at multiple production sites. Here's how the script works:
  1. identify your active databases and daily/weekly low traffic periods.
  2. create a cron job which calls flexible_freeze.py with a time limit to keep it inside your low traffic window.
  3. flexible_freeze.py will loop through your tables with the oldest XIDs, freezing them until it runs out of time or out of tables
There is also a second mode, using the --vacuum switch, which does VACUUM ANALYZE on the tables with the most dead rows (according to pg_stat_user_tables).  This is to help users who have a strong high/low traffic cycle and want to make sure that regular vacuuming takes place during low traffic.  If you're running both modes, we advise doing the freeze first.

Of course, I have a tanker-truck full of desired improvements/expansions to this.  So, pull requests welcome. 

If you're more into Ruby, Wanelo has rewritten flexible freeze for Rails and incorporated it into their Postmodern tool.

Friday, October 10, 2014

New Table Bloat Query

To accompany the New Index Bloat Query, I've written a New Table Bloat Query.  This also involves the launch of the pgx_scripts project on GitHub, which will include most of the "useful scripts" I talk about here, as well as some scripts from my co-workers.

The new table bloat query is different from the check_postgres.pl version in several ways:
  • Rewritten to use WITH statements for better maintainability and clarity
  • Conditional logic for old Postgres versions and 32-bit platforms taken out
  • Index bloat removed, since we have a separate query for that
  • Columns formatted to be more immediately comprehensible
In the course of building this, I found two fundamentally hard issues:
  1. Some attributes (such as JSON and polygon fields) have no stats, so those tables can't be estimated.
  2. There's no good way to estimate bloat for compressed (TOAST) attributes and rows.
Also, while I rewrote the query almost entirely, I am still relying on Greg's core math for estimating table size.  Comparing this with the results of pgstattuple, I'm seeing an error of +/- 20%, which is pretty substantial.  I'm not clear on where that error is coming from, so help improving the math is very welcome!

Results look like this:

  databasename | schemaname |   tablename   | pct_bloat | mb_bloat | table_mb   
 --------------+------------+-------------------+-----------+----------+----------  
  members_2014 | public   | current_member  |    92 |  16.98 |  18.547   
  members_2014 | public   | member_response  |    87 |  17.46 |  20.000   
  members_2014 | public   | archive_member  |    84 |  35.16 |  41.734   
  members_2014 | public   | survey      |    57 |  28.59 |  50.188   

pct_bloat is how much of the table (0 to 100) is estimated to be dead space.  MB_bloat is how many megabytes of bloat are estimated to exist.  Table_mb is the actual size of the table in megabytes.

The suggested criteria is to list tables which are either more than 50% bloat and bigger than 10MB, or more than 25% bloat and bigger than 1GB.  However, you should calibrate this according to your own database.