Wednesday, November 20, 2013

PostGIS Day Tommorrow

The PostGIS project is throwing a "PostGIS Day" online tommorrow.  As part of this, I will be on a panel with Paul Ramsey, Stephen Mather and Bborie Park at 10am PST tommorrow (Thursday) on Google Hangout.  Join us there!

Thursday, November 14, 2013

First Look at PostgreSQL RDS on Amazon

By now a lot of you have heard about the new Postgres RDS on Amazon Web Services.  This is something I've been anticipating for quite a while, so I decided to log in and try it out right away.  Summary is: the service is up, it's real, they did some nice things, the press release isn't completely accurate, and they did fork PostgreSQL.

First off, looking at the feature list and what they decided to include with PostgreSQL, they seem to have decided that Heroku was a good template for what they should do.  So each instance includes PostGIS, hstore, PL/perl, PL/python, etc.  Also they automate point-in-time-recovery in some way, although I need to test that out.  I was particularly interested to see how they could support PL/python, which requires superuser access to create functions.

So I logged into pgExperts' AWS account, and spun up an instance of Postgres RDS.

This gave me warm fuzzies:

Latest version of PostgreSQL, and "auto-minor version upgrade", which is on by default.   Yay!  No nagging AWS users to apply the darned updates, already.  There's also a bunch of options with configurations, backup configurations maintenance windows, security, etc.  Altogether somewhat more "gear-head" than Heroku; you have to configure these things, AWS doesn't do it for you.  By default, for example, port 5432 is blocked from the internet until you create a new "security group" to unblock it.

Deployment of my new instance took longer than I'm used to with AWS (around 15min), but that's possibly because the announcement today has meant a rush of people trying the service.  Besides, AWS has me spoiled for deployment times.

Monday, November 11, 2013

SFPUG November: and Postgres (live video)

As usual, we plan to have live video of Wednesday's San Francisco PostgreSQL User Group meeting. This one will involve showing off their new PostgreSQL-As-A-Service offering, as well as talking about how uses PostgreSQL internally to support their services. Since uses to host many of our domains, it's a wonderful friendly full-circle group hug. Tune in! We're liable to start around 7:15 PM PST.

Sunday, November 10, 2013

Moving tablespaces

PostgreSQL has a feature called "tablespaces" which allows you to put separate data directories on different mount points.  There's a variety of uses for this, including: storing data somewhere different from $PGDATA, putting your hot tables on SSD, or expanding the amount of storage space available to your database without taking it down.  In some cases, they can even be used to help parallelize IO access.

However, it's not infrequently the case that at some point you want to move an existing tablespace to a different mount point, because you're changing storage, re-arranging the filesystem, or making backup easier.  There's two different ways to move tablespaces in current PostgreSQL: the slow online way and the fast downtime way.

The slow online way works like this:
  1. create a new tablespace in the desired new location
  2. go through all of the tables/indexes stored in the old tablespace, and do ALTER <object> TABLESPACE <new_tablespace> on each of them.
  3. drop the old tablespace
This approach, which is the only possible one for older versions of PostgreSQL, has some disadvantages.  One is that it's slow, since you need to move one object at a time unless you script some ad-hoc parallelism, and Postgres doesn't use any tricks like rsync to speed up moving files.   And while it lets you keep the database running, each table is locked for the duration of moving it.  However, if the tables you're moving aren't a critical part of your application (if they're the old partitions, for example), then it's probably the best way.

The second approach requires a database downtime and some understanding of how PostgreSQL stores tablespaces.  Note that this technique will not work on versions prior to 9.2, since those versions also store the tablespace path in a system table, which forces you to use the online method.

If you look in your $PGDATA directory, you will find a subdirectory called pg_tblspc.  This directory contains entries like this:

lrwxrwxrwx 1 josh josh 26535 -> /home/josh/tblspc

Each listing is a link to a directory location.  If you follow that link, you find something like this:

rwx------ 3 josh josh 4096 PG_9.3_201306121

... in the target directory, there is a subdirectory named after the exact version which created it, which actually contains your tablespace data.  At this point, a different way to move the tablespace should have occurred to you, and it does work.

  1. shut down the database system
  2. move or copy the PG_9* subdirectory from the old tablespace mount to the new one.
  3. change the link in pg_tblspc to point to the new mount point
  4. start the database system back up

The reason why this is a "faster" method is that step 2 allows you to use whatever filesystem copying tricks you wish to speed things up, such as a 2-stage rsync.  It also allows you to handle cases where both tablespaces won't be online at the same time.

Also note that I'm not sure this works the same way on Windows.

Hope that helps!

Friday, November 8, 2013

ALTER TABLE and downtime, part II

Now, you'll note in part I that I said that doing certain ALTER operations was "free", in quotes.  That's because these operations aren't actually worry-free; there's the major issue of locking.

Currently, any ALTER TABLE you run requires an ACCESS EXCLUSIVE lock on the altered table in order to commit.  This is a lock which blocks not only writes, but reads as well.  You don't see read locks much in PostgreSQL, so one tends to forget they exist until you do something like running an ALTER TABLE in a database which processes 15,000 queries per second.  Then, this sort of thing can happen:
  1. ALTER TABLE waits behind a long-running reporting or write query.
  2. SELECT queries start to pile-up behind ALTER TABLE statement
  3. server starts performing badly because of the pile-up, and is effectively locked up for a few minutes.
  4. the pile-up takes several more minutes to clear, even after the ALTER has gone through.
This gets even worse if the table being changed has foreign keys referencing it; then any write queries against the referencing tables are also blocked.  The result is something which looks very much like a downtime, even though we're using transactional DDL.

This means that, for deployments which involve ALTER TABLE, you'll need to plan on a brief downtime or at least watching the database closely if you run an extremely-high-transaction-rate database with many multi-statement transactions.  In many cases I get around this by polling for a manual lock on the table, and proceeding with the ALTER once I get one.   This doesn't work well with existing schema management tools, though.

I'm not just talking about ALTER TABLE ADD/DROP COLUMN here; this happens for any ALTER TABLE statement, including ALTER TABLE SET STATISTICS.  Unfortunately, while PostgreSQL DDL statements are transaction-safe, our system tables themselves are not currently MVCC, which means that the only way we can modify anything in the system tables is to take an exclusive lock on it.

Fortunately, thanks to Robert Haas, that just changed in 9.4 development.  In the future, we should be able to do the following operations without taking more than a share lock on the table:
  • DROP a foreign key from a referencing table
And these operations should require only a regular write lock on the table:
... which will be really cool, and a boon to DBAs and DevOps everywhere.

There is one special case of ALTER TABLE you need to know about in addition to this, though.  That's ALTER TABLE ALTER COLUMN ... TYPE <new_data_type>.   More on this in a succeeding post.

DDL = Data Definition Language, i.e. CREATE/ALTER/DROP statements.

MVCC = Multi-Version Concurrency Control, the way we ensure that users can see different versions of the database depending on their current transaction context.

Tuesday, November 5, 2013

ALTER TABLE and downtime, part I

One of PostgreSQL's headline features -- at least, compared to other relational databases -- is the ability to do some ALTER TABLE ALTER COLUMN commands for "free".  It's part of how we try to make things easier for DBAs and Devops staff, by not requiring a downtime for operations which require one on other databases.  But there are some limitations and caveats.

First, let me give you the good news.  The following operations affect metadata only and require neither rewriting the table nor any per-row verification of values:
DROP COLUMN is the easy one; when you DROP a column in PostgreSQL, it simply marks the column deprecated in pg_attribute and moves on.  The column data won't actually be expunged from the table until the individual data pages get VACUUMed.  Of course, that's a bug if you actually thought you were going to reclaim some space, but most users regard it as a great anti-downtime feature.

ADD COLUMN can be free too, provided that the new column allows NULLs and doesn't have a DEFAULT value set.  You might think the following two transactions are almost identical:



... but the first one will rewrite the table entirely, whereas the second one will not.  If you try both of those, and look at the results, it becomes clear why:

Version 1:

 user_id | hat_size
    1021 | L
    1022 | L
    1023 | L
    1025 | L
    1026 | L

Version 2:

 user_id | hat_size
    1021 |
    1022 |
    1023 |
    1025 |
    1026 |

As you see, if you do the ADD and the DEFAULT in a single statement, Postgres helpfully pre-populates the column for you.  Whereas if you do them in two statements, it doesn't, and therefore doesn't rewrite the whole table.  Yes, I realize that this behavior is fairly arbitrary, but changing it at this point would break a lot of backwards compatibility.  And the "auto-populate the default" feature was added in the first place (in 7.2 IIRC) because people wanted it.

Now, above I said that certain actions were "free".  The reason for the quotation marks is that these ALTERS do actually have costs.  More on that in my next post.