Wednesday, December 11, 2013

Meet your new NoSQL Database

... Postgres.  You wanna document store?  We got your document store, right here.

Thanks to Craig Kersteins, I just learned about the pgREST project.  This project turns PostgreSQL into a RESTful JSON document store, capable of running ad-hoc Javascript code (either LiveScript or v8) in order to do searches and batch modififications on the server.

pgBSON, a Postgres extension implementing support for a MongoDB-compatible BSON datatype, also recently went 1.0, signalling that it's ready for production use.

These are powerful demonstration of how adaptable Postgres is.  If it's data, we can find a way to do it. And unlike a typical document store database, you can do relational-SQL stuff as well, giving you your choice of interface.

pgREST is also our first major project out of our Taiwanese community, so I'm pretty excited about it.

Of course, we're still waiting for fully indexable, binary-storage JSON in 9.4, but that's on schedule for September.

Friday, December 6, 2013

Checksums! On December 10th

SFPUG will be hosting committer Jeff Davis talking about his 9.3 feature, data page checksums for PostgreSQL, and how to use them.  As usual, we will have live video of his presentation.  Presentation should start around 7:15 PST.

Thursday, December 5, 2013

Why you need to apply today's update real soon

So, another month, another PostgreSQL update.  This one is a lot more critical than most because it patches up to three data loss issues, depending on what version of PostgreSQL you're on and which features you're using.  While all users need to plan to apply this update sometime during the next couple weeks, two kinds of users need to schedule an update downtime for this weekend, if at all possible:
  1. Users on version 9.3
  2. Users of binary replication
While all three data loss issues depending on timing, and are thus low-probability, all three are impossible to predict in advance and difficult to detect when they occur.  As such, they should be considered high-risk and an urgent fix.  Of the three issues, the replication one affects 9.0, 9.1, 9.2 and 9.3, one of the VACUUM issues affects all versions, and the other affects only 9.3.

Annoyingly, you'll have to do some additional stuff after you update:
  • If using binary replication, you need to take a new base backup of each replica after updating.
  • You should run the following on each production database after updating:
SET vacuum_freeze_table_age = 0;
VACUUM; -- optionally, ANALYZE as well

This second step is critical for users on 9.3, and a generally good idea for users on other versions. Note that, while VACUUM is a non-blocking operation, it can create a lot of IO, so it's a good idea to do this during a slow traffic period and monitor it for pile-ups.

More information about the replication issue is here.

So, how did this happen?  Well, all three issues were unexpected side effects of fixes we applied for other issues in earlier versions.  For example, the replication issue is the result of the combination of two independent fixes for failover issues, both of which needed to be fixed.  Since all of these issues depend on both timing and heavy traffic, and their effect is subtle (as in, a few missing or extra rows), none of our existing testing was capable of showing them.

If anyone wants to devise destruction tests to detect subtle data corruption issues in PostgreSQL before we release code -- if anyone can -- please offer to help!

Wednesday, November 20, 2013

PostGIS Day Tommorrow

The PostGIS project is throwing a "PostGIS Day" online tommorrow.  As part of this, I will be on a panel with Paul Ramsey, Stephen Mather and Bborie Park at 10am PST tommorrow (Thursday) on Google Hangout.  Join us there!

Thursday, November 14, 2013

First Look at PostgreSQL RDS on Amazon

By now a lot of you have heard about the new Postgres RDS on Amazon Web Services.  This is something I've been anticipating for quite a while, so I decided to log in and try it out right away.  Summary is: the service is up, it's real, they did some nice things, the press release isn't completely accurate, and they did fork PostgreSQL.

First off, looking at the feature list and what they decided to include with PostgreSQL, they seem to have decided that Heroku was a good template for what they should do.  So each instance includes PostGIS, hstore, PL/perl, PL/python, etc.  Also they automate point-in-time-recovery in some way, although I need to test that out.  I was particularly interested to see how they could support PL/python, which requires superuser access to create functions.

So I logged into pgExperts' AWS account, and spun up an instance of Postgres RDS.

This gave me warm fuzzies:



Latest version of PostgreSQL, and "auto-minor version upgrade", which is on by default.   Yay!  No nagging AWS users to apply the darned updates, already.  There's also a bunch of options with configurations, backup configurations maintenance windows, security, etc.  Altogether somewhat more "gear-head" than Heroku; you have to configure these things, AWS doesn't do it for you.  By default, for example, port 5432 is blocked from the internet until you create a new "security group" to unblock it.

Deployment of my new instance took longer than I'm used to with AWS (around 15min), but that's possibly because the announcement today has meant a rush of people trying the service.  Besides, AWS has me spoiled for deployment times.

Monday, November 11, 2013

SFPUG November: Gandi.net and Postgres (live video)

As usual, we plan to have live video of Wednesday's San Francisco PostgreSQL User Group meeting. This one will involve Gandi.net showing off their new PostgreSQL-As-A-Service offering, as well as talking about how Gandi.net uses PostgreSQL internally to support their services. Since PostgreSQL.org uses Gandi.net to host many of our domains, it's a wonderful friendly full-circle group hug. Tune in! We're liable to start around 7:15 PM PST.

Sunday, November 10, 2013

Moving tablespaces

PostgreSQL has a feature called "tablespaces" which allows you to put separate data directories on different mount points.  There's a variety of uses for this, including: storing data somewhere different from $PGDATA, putting your hot tables on SSD, or expanding the amount of storage space available to your database without taking it down.  In some cases, they can even be used to help parallelize IO access.

However, it's not infrequently the case that at some point you want to move an existing tablespace to a different mount point, because you're changing storage, re-arranging the filesystem, or making backup easier.  There's two different ways to move tablespaces in current PostgreSQL: the slow online way and the fast downtime way.

The slow online way works like this:
  1. create a new tablespace in the desired new location
  2. go through all of the tables/indexes stored in the old tablespace, and do ALTER <object> TABLESPACE <new_tablespace> on each of them.
  3. drop the old tablespace
This approach, which is the only possible one for older versions of PostgreSQL, has some disadvantages.  One is that it's slow, since you need to move one object at a time unless you script some ad-hoc parallelism, and Postgres doesn't use any tricks like rsync to speed up moving files.   And while it lets you keep the database running, each table is locked for the duration of moving it.  However, if the tables you're moving aren't a critical part of your application (if they're the old partitions, for example), then it's probably the best way.

The second approach requires a database downtime and some understanding of how PostgreSQL stores tablespaces.  Note that this technique will not work on versions prior to 9.2, since those versions also store the tablespace path in a system table, which forces you to use the online method.

If you look in your $PGDATA directory, you will find a subdirectory called pg_tblspc.  This directory contains entries like this:

lrwxrwxrwx 1 josh josh 26535 -> /home/josh/tblspc

Each listing is a link to a directory location.  If you follow that link, you find something like this:

rwx------ 3 josh josh 4096 PG_9.3_201306121

... in the target directory, there is a subdirectory named after the exact version which created it, which actually contains your tablespace data.  At this point, a different way to move the tablespace should have occurred to you, and it does work.

  1. shut down the database system
  2. move or copy the PG_9* subdirectory from the old tablespace mount to the new one.
  3. change the link in pg_tblspc to point to the new mount point
  4. start the database system back up

The reason why this is a "faster" method is that step 2 allows you to use whatever filesystem copying tricks you wish to speed things up, such as a 2-stage rsync.  It also allows you to handle cases where both tablespaces won't be online at the same time.

Also note that I'm not sure this works the same way on Windows.

Hope that helps!

Friday, November 8, 2013

ALTER TABLE and downtime, part II

Now, you'll note in part I that I said that doing certain ALTER operations was "free", in quotes.  That's because these operations aren't actually worry-free; there's the major issue of locking.

Currently, any ALTER TABLE you run requires an ACCESS EXCLUSIVE lock on the altered table in order to commit.  This is a lock which blocks not only writes, but reads as well.  You don't see read locks much in PostgreSQL, so one tends to forget they exist until you do something like running an ALTER TABLE in a database which processes 15,000 queries per second.  Then, this sort of thing can happen:
  1. ALTER TABLE waits behind a long-running reporting or write query.
  2. SELECT queries start to pile-up behind ALTER TABLE statement
  3. server starts performing badly because of the pile-up, and is effectively locked up for a few minutes.
  4. the pile-up takes several more minutes to clear, even after the ALTER has gone through.
This gets even worse if the table being changed has foreign keys referencing it; then any write queries against the referencing tables are also blocked.  The result is something which looks very much like a downtime, even though we're using transactional DDL.

This means that, for deployments which involve ALTER TABLE, you'll need to plan on a brief downtime or at least watching the database closely if you run an extremely-high-transaction-rate database with many multi-statement transactions.  In many cases I get around this by polling for a manual lock on the table, and proceeding with the ALTER once I get one.   This doesn't work well with existing schema management tools, though.

I'm not just talking about ALTER TABLE ADD/DROP COLUMN here; this happens for any ALTER TABLE statement, including ALTER TABLE SET STATISTICS.  Unfortunately, while PostgreSQL DDL statements are transaction-safe, our system tables themselves are not currently MVCC, which means that the only way we can modify anything in the system tables is to take an exclusive lock on it.

Fortunately, thanks to Robert Haas, that just changed in 9.4 development.  In the future, we should be able to do the following operations without taking more than a share lock on the table:
  • SET STATISTICS
  • DROP TRIGGER/CONSTRAINT
  • DROP a foreign key from a referencing table
And these operations should require only a regular write lock on the table:
  • SET DEFAULT
  • SET NOT NULL
  • ADD COLUMN
... which will be really cool, and a boon to DBAs and DevOps everywhere.

There is one special case of ALTER TABLE you need to know about in addition to this, though.  That's ALTER TABLE ALTER COLUMN ... TYPE <new_data_type>.   More on this in a succeeding post.

DDL = Data Definition Language, i.e. CREATE/ALTER/DROP statements.

MVCC = Multi-Version Concurrency Control, the way we ensure that users can see different versions of the database depending on their current transaction context.

Tuesday, November 5, 2013

ALTER TABLE and downtime, part I

One of PostgreSQL's headline features -- at least, compared to other relational databases -- is the ability to do some ALTER TABLE ALTER COLUMN commands for "free".  It's part of how we try to make things easier for DBAs and Devops staff, by not requiring a downtime for operations which require one on other databases.  But there are some limitations and caveats.

First, let me give you the good news.  The following operations affect metadata only and require neither rewriting the table nor any per-row verification of values:
  • DROP COLUMN
  • ADD COLUMN with no DEFAULT
DROP COLUMN is the easy one; when you DROP a column in PostgreSQL, it simply marks the column deprecated in pg_attribute and moves on.  The column data won't actually be expunged from the table until the individual data pages get VACUUMed.  Of course, that's a bug if you actually thought you were going to reclaim some space, but most users regard it as a great anti-downtime feature.

ADD COLUMN can be free too, provided that the new column allows NULLs and doesn't have a DEFAULT value set.  You might think the following two transactions are almost identical:

ALTER TABLE users ADD COLUMN hat_size TEXT DEFAULT 'L';

BEGIN;
ALTER TABLE users ADD COLUMN hat_size TEXT;
ALTER TABLE users ALTER COLUMN hat_size SET DEFAULT 'L';
COMMIT;

... but the first one will rewrite the table entirely, whereas the second one will not.  If you try both of those, and look at the results, it becomes clear why:

Version 1:

 user_id | hat_size
---------+----------
    1021 | L
    1022 | L
    1023 | L
    1025 | L
    1026 | L


Version 2:

 user_id | hat_size
---------+----------
    1021 |
    1022 |
    1023 |
    1025 |
    1026 |


As you see, if you do the ADD and the DEFAULT in a single statement, Postgres helpfully pre-populates the column for you.  Whereas if you do them in two statements, it doesn't, and therefore doesn't rewrite the whole table.  Yes, I realize that this behavior is fairly arbitrary, but changing it at this point would break a lot of backwards compatibility.  And the "auto-populate the default" feature was added in the first place (in 7.2 IIRC) because people wanted it.

Now, above I said that certain actions were "free".  The reason for the quotation marks is that these ALTERS do actually have costs.  More on that in my next post.

Thursday, October 24, 2013

SFPUG Wants Your Feedback

The San Francisco PostgreSQL User Group is looking for feedback and recommendations for what we should do in 2014.  As such, I've set up a survey here.  We are also looking for speakers, sponsors and venue hosts; contact me or fill out the survey.  Thanks!

Friday, October 18, 2013

Adjusting defaults for PostgreSQL.conf

For PostgreSQL 9.4, we've been discussing adjusting defaults for postgresql.conf to give users a better out-of-box experience.  Our existing defaults were set at a time when we used SysV memory, and when the average developer laptop or low-end server had 512MB of RAM.   So, a survey.

Here's some things contributors have already adjusted:
  • shared_buffers now goes up to 128MB automatically depending on available RAM reservation (we did this for 9.3)
  • effective_cache_size is automatically set to shared_buffers X 3 (might be X 4 by the time we release)
Now we are discussing increasing the defaults and/or autotuning for work_mem and maintenance_work_mem.  Suggested new defaults are 4X the current: 4MB and 64MB.  These should allow PostgreSQL to run on any machine of 1GB or greater memory a bit better.

However, one thing which came up during this discussion is that many people may be running PostgreSQL on virtual machines which have less than 1GB of RAM available to PostgreSQL.  I don't think they are, but we lack data.  So I put up a survey to see what amounts of RAM our users really have on their smallest machines/VMs.   Please fill out the survey so that we can have reasonable input for development, and if you have comments, please put them below.

Thursday, October 10, 2013

How urgent is that Update?

We've released a cumulative bugfix release today for all supported versions of PostgreSQL, in case you somehow missed it.  This update doesn't contain anything time-critical for most users.  You'll want to add it to your upgrade queue for your next scheduled downtime, but you don't need to schedule an update for this weekend.  Unless, of course, you are one of the two dozen or so users who reported the bugs fixed.

However, you don't really need me to tell you that, because it's in the language of the release.  More on this below.

A couple of notable fixes in this release:

Hstore-JSON extension fix: if you are using both Hstore and JSON in Postgres 9.3, you should run ALTER EXTENSION hstore UPDATE after updating your server.  Due to a mistake in extension versioning, users missed out on some JSON-hstore conversion functions in 9.3.

Guarantee transmission of all WAL files: users with replication setups and stringent data retention requirements will want to apply the update sooner rather than later because of this fix.  It fixes an issue where under some circumstances a promoted replica might ignore a few transactions it actually received from the master before failover, considering them lost.

Over the last dozen or so update releases, we've standardized the language we use to explain how critical the fixes in the release are for most users.

First, there's the question of what kind of update it is:

Bugfix Release or Update Release: this usually contains two to four months of accumulated bug fixes for various reported bugs.   Some of these bugs may be critical data loss bugs; see the release contents to check this.

Security Release: the update contains a patch for one or more security holes.  This makes the update at least somewhat urgent, because as soon as the security patch is posted online, the black hats start working on exploits (assuming they didn't before then).

Then we use some standard language about the general criticality of bugs fixed in the release:

All users of the affected versions are strongly urged to apply the update immediately: release contains a high-exposure security hole or critical data loss bug which affects all users.  Stop reading, declare a downtime, and apply the update now.  Fortunately, we've only had one of these in the last 6 years.

All users should apply this update as soon as possible: the release contains one or more security holes and/or major data loss bugs expected to affect most users.  Schedule a downtime tonight or this weekend and apply the update.

All users of _________ should apply this update immediately/as soon as possible: the same as above, but only for users of a particular feature or extension.

Users should apply this update at the next available opportunity: the release contains some low-risk security holes and/or serious data loss bugs which only occur under certain circumstances.  Updating to this release shouldn't be put off for more than a couple weeks, but you can wait for a good low-traffic window to take the downtime.

Users should update at the next regular maintenance window/scheduled downtime: this update contains nothing critical or even serious for most users; it's largely a cumulative collection of bugfixes.  The update should be added to the general queue of OS, library and application updates for the production servers.

Note, however, that I said most users. Every update we push out is urgent for at least a few users who are hit by the specific bugs involved.  For example, for the current release, users who depend heavily on the following features should consider updating sooner rather than later:

  • Range indexes
  • SP-GiST
  • JSON
  • regular expressions
  • prepared query plans
  • SSL connections
  • more than 24GB (!) of work memory

That's why we're reluctant to adopt any system which rates update releases on some absolute scale.  It doesn't matter how critical the update is in general; it matters how critical it is to you.

Monday, October 7, 2013

Fancy SQL Monday: Finding broken keys

One of the things which usually happens on the rare occasions when you have PostgreSQL database corruption is that indexes become corrupted so that index entries no longer match the underlying table.  For regular indexes, this can be fixed with a REINDEX or pg_repack.  But it becomes a real problem when the index was used for a primary key, unique index or foreign key, because then duplicate or broken data gets inserted into the tables, and attempts to rebuild the index will fail.

Database-wide, the easiest way to find these broken indexes is to do a pg_dump and pg_restore of your whole database; data which violates keys will show up as errors on the pg_restore.

The second thing we need to do is find the duplicate or missing data so that we can "clean" it and then rebuild the index.  First, we'll need to disable index access to make sure that we don't hit the index and get erroneous information back:

SET enable_indexscan = off;
SET enable_bitmapscan = off;
SET enable_indexonlyscan = off;  -- 9.2+ only!

So for our first SQL trick, we want to query a table with a simple SERIAL surrogate key and find duplicate IDs.  Then, for each of these IDs, we want to get the entire referenced row so we can look at it and decide how to fix the data.  One way to do that is using a WITH clause and an aggregate:

WITH dups AS (
    SELECT ticketid, count(*) 

    FROM tickets 
    GROUP BY ticketid 
    HAVING count(*) > 1 )
SELECT * 

FROM tickets 
   JOIN dups USING ( ticketid )
ORDER BY ticketid;

And that gives us a list of duplicate ticketids.  From there, we can choose to delete rows, assign new IDs, or do something else, depending on the nature of the duplication.   Of course, if you never created any real keys (in addition to the surrogate key), you're going to have a devil of a time figuring things out.

However, the above query format isn't the only way we can search for duplicate IDs.  We can also use a windowing query, which is often easier to type if we have multicolumn unique constraints:

WITH dupcount AS (
   SELECT comments.*,
       count(*) OVER (
            PARTITION BY post, commentor, commented_at)
            AS entrycount 
   FROM comments )
SELECT * FROM dupcount
WHERE  entrycount > 1
ORDER BY post, commentor, commented_at;

Finding broken foreign keys is a bit more straightforwards.  In this case, you just want to use an outer join to find the missing parent records:

SELECT shoppingcart_item.*
FROM shoppingcart_item
   LEFT OUTER JOIN shoppingcart
   ON shoppingcart_item.shoppingcart_id 
          = shoppingcart.shoppingcart_id
WHERE shoppingcart.shoppingcart_id IS NULL
ORDER BY shoppingcart_item.shoppingcart_id;

Once you've found those, though, you can't do much more than dump them to a file and delete the records, since there's no way to find out what the missing parent record data was.

Happy de-corrupting!

Friday, October 4, 2013

De-corrupting TOAST Tables

In a database which has been mildly corrupted, the second issue you're liable to run into is corrupt TOAST tables.  You'll know you have one because you'll be running a SELECT against the table or a pg_dump, and you'll get an error message like this one:

ERROR:  unexpected chunk size 226 (expected 175) in final chunk 1 for toast value 8846932 in pg_toast_2877446 

TOAST tables are where PostgreSQL stores offline compressed data for large values, like big text fields, geometries, large arrays, and BLOBs.   What the server is telling you in that error message is that the size of the compressed block in the TOAST table doesn't match its metadata, and as a result Postgres can't decompress it.

Among other things, this is generally an indication that either you've found a PostgreSQL bug (which you should report, if you're running on the latest update version), or you've had a serious corruption event, if you didn't know already.  So you should definitely do a full corruption check on the database.

But, what to do about the bad TOAST record?

Obviously, if you have a backup from just before the corruption occurred the answer is to restore that backup.  But if you don't: this is fortunately hand-correctable.

The first thing to do is to find the actually row or rows where the corruption is.  I do that with a simple DO script like this:

DO $f$
declare
    curid INT := 0;
    vcontent TEXT;
    badid INT;
begin
FOR badid IN SELECT message_id FROM messages LOOP
    curid = curid + 1;
    if curid % 100000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT contents
        INTO vcontent
        FROM messages where message_id = badid;

        vcontent := substr(vcontent,1000,2000);
    exception
        when others then
            raise notice 'data for message % is corrupt', badid;
            continue;
    end;
end loop;
end;
$f$;


The goal here is to reach into the TOASTed value for each row (the large-text contents field) and de-compress it, which should raise any errors you're going to get.  Of course, you need to tailor the above for your own tables.

Once you've identified one or more rows, there are 3 steps you need to take to clean up the corrupted TOAST record.  Note that this does not involve recovering the corrupted value: that's already lost, and you'll do yourself a favor by not trying to hang onto it.  Get that from your backups.

Once you've located the row or rows which have corrupt TOAST data, you need to do three things to make sure that the DB is clean:
  1. Re-index the TOAST table (in case it's the index which is corrupt)
  2. Break the link between the live row and the corrupt TOAST record.
  3. Rebuild the regular table in order to expunge the bad TOAST pointer.
All TOAST tables are located in the pg_toast schema.  And the error message already gave us the name of the bad TOAST table.  So we reindex by:

REINDEX TABLE pg_toast.pg_toast_2877446;

Breaking the link is the most obvious step:

UPDATE messages SET contents = '' WHERE message_id = 432212;

By setting the TOASTED column to a new value, we make sure that the "current" version of the row does not link to the bad TOAST entry.  I did say we'd be destroying data here, yes?

But that's not all we have to do.  The old row version still exists, and will continue to produce errors every time VACUUM visits it (preventing VACUUMs of that table from completing).  So we need to build a new version of the table which does not include that old row version.

Fortunately, there is a utility to do this: the super-awesome pg_repack, by Daniele Varrazzo, Josh Kupershmidt and Itagaki Takahiro. This utility will save your life someday, so buy Daniele, Josh and/or Itagaki a drink of their choice if you see them.

pg_repack -n -t messages messagedb

Now, while pg_repack does not need an exclusive lock for most of the rebuild, it does need three things, which you need to be prepared for:
  1. enough disk space to make a full copy of the table and its indexes.
  2. a short exclusive lock at the very end of processing, including on related tables by FK.
  3. to wait for all long-running transactions to complete
However, once you've done this, you should have a clean table free of TOAST corruption.

Wait, though, what's this?

ERROR:  could not create unique index "messages_pkey"
DETAIL:  Key (message_id)=(795437) is duplicated.


Oh-oh.  Looks like we have more work to do.  See you Monday!




Thursday, October 3, 2013

First thing you do is take a backup

Last week seems to have been a week for "pilot error": not just on SouthWest, but among our clientele at PGX.  Three customers in two days managed to corrupt their databases through sysadmin mistakes, such as remounting the SAN with PostgreSQL running.  Which reminded me of an important tip for what you should do whenever anything goes really wrong with your database server:
  1. Shut down PostgreSQL if it's running.
  2. Check if you have current valid backups and/or replicas which are not already corrupt. If so, make sure these are protected.
  3. If there are no backups, make a full copy of the database files and all transaction logs.
You'll be tempted do everything you can to get the main production server up and running as soon as possible, rather than waiting for backups, especially if your database is quite large.  Resist that temptation.

Many things you will try to rescue a corrupt database or a damaged server can cause further destruction of data.  Worse, certain kinds of corruption (caused by bad storage or bad RAM, for example) will spread to the replicas and/or your backups given time.  In a worst case, you can fiddle with your damaged DB server long enough that the regularly scheduled backup kicks in ... and overwrites a good backup with a corrupt one.

Of course, maintaining a few days or weeks of rotational backups is an even better idea, and protects you against many forms of data loss.  I recommend Barman or pgRollDump for this purpose.

Saturday, September 21, 2013

Free VMs for Reviewers

In case you somehow missed it, the second CommitFest for PostgreSQL 9.4 development is underway.  While we don't have quite as many patches in the queue as we had last time, we have quite a few, and more than we've had for any CF2 before.  This means we need as many reviewers as possible!

In order to make sure that nobody is prevented from reviewing due to not having a suitable development environment available, the PostgreSQL community is offering free virtual machines for reviewing and testing patches for this CommitFest.

If you want a VM for this purpose, please email me the following at any point during the CF:
  1. Your public SSH key.
  2. Which patch(es) you intend to review/test.
  3. Preference for CentOS or Ubuntu.
  4. When you expect to do the review.
Since these are EC2 virtual servers, they will not be suitable for performance testing.  However, if you want to review a replication patch, we can make multiple VMs available.  All VMs will be terminated at the end of the CF.

VMs will come equipped with all tools required to build PostgreSQL, plus a checkout of the PostgreSQL code from the beginning of the CF.  You may need to install additional requirements yourself if you are reviewing extensions or special features with dependencies.

Note that I may take a day to respond with your VM access.

Monday, September 9, 2013

More about my favorite 9.3 Features (video and more)

If you somehow missed it, PostgreSQL 9.3.0 is now available, just in time for your back-to-school shopping.  9.3 includes lots of great stuff, some of which I've already been using in development, and I'll tell you more about my favorites below.  There's also a survey and we'll have live video of SFPUG doing 9.3 on Thursday.

No More SHMMAX


We didn't emphasize this in the release announcement -- mainly because it's like removing a wart, you won't want to talk about it -- but this is the one 9.3 change liable to make life easier for more developers than any other.  We've stopped using SysV memory for anything other than the postmaster startup lock, which means that you can now adjust shared_buffers to your heart's content without needing to mess with sysctl.conf.  Let alone the satanic incantations you have to go through on the Mac.

This also clears one of the main barriers to writing simple autotuning scripts.  Which means I'm out of excuses for not having written one.

Custom Background Workers


Need a daemon to do background work alongside Postgres, such as scheduling, queueing, maintenance, or replication?  Maybe you want to intercept MongoDB-formatted queries and rewrite them for Postgres?  Custom background workers allow you to create your own "autovacuum daemon" which does whatever you want it to.

Michael Paquier will be presenting Background Workers for SFPUG on Thursday the 12th (7:30PM PDT).  Details on our Meetup Page, including a link to live video for those of you not in the Bay Area.

Streaming-Only Cascading


This has been my biggest desire since 9.2 came out; we were so close to not needing to worry about archiving, ever, for small databases.  And now we're there.  You can make chains of replicas, fail over to one of them, remaster, make a replica at a new data center the master, change the direction of replication, and lots more configurations without needing to worry about WAL archiving and all its overhead.

If you combine this with Heikki's work on pg_rewind, things get even more flexible since you don't have to resnapshot for failback anymore.

I'll be presenting a live demo of this feature at the SFPUG meeting, including connecting replicas in a ring (all replicas -- no master!). 

So, what's your favorite 9.3 feature?  Vote here!

Friday, August 23, 2013

PostgreSQL plus Vertica on Tuesday: SFPUG Live Video

This upcoming Tuesday, the 27th, SFPUG will have live streaming video of Chris Bohn from Etsy talking about how he uses PostgreSQL and Vertica together to do data analysis of Etsy's hundreds of gigabytes of customer traffic.  barring technical difficulties with the video or internet, of course.

The video will be on the usual SFPUG Video Channel.  It is likely to start around 7:15PM PDT.  Questions from the internet will be taken on the attached chat channel.

For those in San Francisco, this event will be held at Etsy's new downtown SF offices, and Etsy is sponsoring a Tacolicious taco bar.  Of course, the event is already full up, but you can always join the waiting list.

In other, related events, sfPython will be talking about PostgreSQL performance, and DjangoSF will be talking about multicolumn joins, both on Wednesday the 28th.  I'll be at DjangoSF, doing my "5 ways to Crash Postgres" lightning talk.

Monday, August 12, 2013

Fancy SQL Monday: ON vs. NATURAL JOIN vs. USING

First, there was the SQL89 JOIN, where there was no JOIN statement, and joins were performed by adding WHERE clauses.  To wit:

SELECT users.name, count(*)
FROM users, comments
WHERE users.user_id = comments.user_id
   AND users.status = 'active'
   AND comments.comment_date BETWEEN '2013-07-01' and '2013-07-31'
GROUP BY users.name;

There were two problems with this kind of JOIN: you often got join conditions mixed up with other filters and accidentally did cartesian products, and there was no way to do directional (outer) joins.

Not that that keeps people from continuing to use it.

So then we got the SQL92 JOIN:

SELECT users.name, count(*)
FROM users JOIN comments
   ON users.user_id = comments.user_id
WHERE users.status = 'active'
   AND comments.comment_date BETWEEN '2013-07-01' and '2013-07-31'
GROUP BY users.name;

But this wasn't the end of things.  Since SQL92 allowed users to define FOREIGN KEYS, why not allow them to just join along the keys "naturally"?

SELECT users.name, count(*)
FROM users NATURAL JOIN comments
WHERE users.status = 'active'
   AND comments.comment_date BETWEEN '2013-07-01' and '2013-07-31'
GROUP BY users.name;

Except that the SQL committee screwed up, and instead of using the declared foreign keys, NATURAL JOIN looks for columns with identical names.  And if there are several columns with identical names, there's no standard way to resolve them.  So, NATURAL JOIN, which ought to have been a real keyboard-saver, instead turned out to be uselessly unreliable.

One of the annoying issues about joins between two tables on columns of the same name is that SQL wants you to constantly qualify which table you're asking for the column from, even though the contents are identical:

ERROR:  column reference "user_id" is ambiguous

This was annoying, so they created the USING clause instead:

SELECT user_id, users.name, count(*)
FROM users JOIN comments USING ( user_id )
WHERE users.status = 'active'
   AND comments.comment_date BETWEEN '2013-07-01' and '2013-07-31'
GROUP BY user_id, users.name;

In Postgres, NATURAL JOIN also conflates matching columns, but it doesn't do this on all DBMSes.

Of course, USING only works if your columns do have the same names.  All the more reason to adopt that as a naming convention!

Thursday, August 8, 2013

Another reason to set pg_hba.conf restrictively

pg_hba.conf can be a pain in the neck.  You're expecting some burst traffic, you deploy 8 new virtual appservers, fire them up, and ... they can't connect to Postgres.  Gah!

That kind of experience makes it tempting to set everything to:

   host  all  all   0.0.0.0/0   md5

But I just encountered another case this month why you shouldn't.  While everyone thinks about malefactors as the main reason to have tight restrictions on which hosts can connect via pg_hba.conf, one should remember that accidents are far more common than malice.  As was the case this time.

I set up a benchmark test for a client which involved replaying 8 hours of activity logs against a test copy of their database.  What they didn't tell me (because they didn't know due to poor internal documentation) was that their database used DBlink to write data to another database several times a minute.

Thank goodness they had exercised good pg_hba.conf practices!  All I got was a bunch of DBlink errors, instead of scribbling all over production data.   Whew!