Sunday, November 10, 2013

Moving tablespaces

PostgreSQL has a feature called "tablespaces" which allows you to put separate data directories on different mount points.  There's a variety of uses for this, including: storing data somewhere different from $PGDATA, putting your hot tables on SSD, or expanding the amount of storage space available to your database without taking it down.  In some cases, they can even be used to help parallelize IO access.

However, it's not infrequently the case that at some point you want to move an existing tablespace to a different mount point, because you're changing storage, re-arranging the filesystem, or making backup easier.  There's two different ways to move tablespaces in current PostgreSQL: the slow online way and the fast downtime way.

The slow online way works like this:
  1. create a new tablespace in the desired new location
  2. go through all of the tables/indexes stored in the old tablespace, and do ALTER <object> TABLESPACE <new_tablespace> on each of them.
  3. drop the old tablespace
This approach, which is the only possible one for older versions of PostgreSQL, has some disadvantages.  One is that it's slow, since you need to move one object at a time unless you script some ad-hoc parallelism, and Postgres doesn't use any tricks like rsync to speed up moving files.   And while it lets you keep the database running, each table is locked for the duration of moving it.  However, if the tables you're moving aren't a critical part of your application (if they're the old partitions, for example), then it's probably the best way.

The second approach requires a database downtime and some understanding of how PostgreSQL stores tablespaces.  Note that this technique will not work on versions prior to 9.2, since those versions also store the tablespace path in a system table, which forces you to use the online method.

If you look in your $PGDATA directory, you will find a subdirectory called pg_tblspc.  This directory contains entries like this:

lrwxrwxrwx 1 josh josh 26535 -> /home/josh/tblspc

Each listing is a link to a directory location.  If you follow that link, you find something like this:

rwx------ 3 josh josh 4096 PG_9.3_201306121

... in the target directory, there is a subdirectory named after the exact version which created it, which actually contains your tablespace data.  At this point, a different way to move the tablespace should have occurred to you, and it does work.

  1. shut down the database system
  2. move or copy the PG_9* subdirectory from the old tablespace mount to the new one.
  3. change the link in pg_tblspc to point to the new mount point
  4. start the database system back up

The reason why this is a "faster" method is that step 2 allows you to use whatever filesystem copying tricks you wish to speed things up, such as a 2-stage rsync.  It also allows you to handle cases where both tablespaces won't be online at the same time.

Also note that I'm not sure this works the same way on Windows.

Hope that helps!

7 comments:

  1. Second Way: Need we update any system tables ?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  2. Hi Josh,

    in the old systems (e.g. 9.0) you could:
    - shut down the DB
    - move tablespace
    - fix the sym link in pg_tblspc

    once you start the DB, one additional update in pg_tablespace catalog is needed to fix the path (pg_tablespace.spclocation).

    The OS is Linux: didn't try this in Windows.

    Best,
    Milos

    ReplyDelete
    Replies
    1. This is correct, and should work equally well at Windows. In fact, you don't even "have to" update the spclocation field - PostgreSQL never actually *uses* that. It might be in use by third party tools though, and I think it's in use by pgAdmin, so it's definitely recommended.

      But the fact that it was never actually used is what made it very easy for me to get rid of it completely, replacing it with the function that inspects the symlink.

      Delete
    2. Ive followed these tips and I got my tablespaces moved to another location but pgadmin still shows the old location. How can I change this information in 9.2.6?
      Thx in adv
      Adolfho

      Delete
  3. I can confirm that WIndows uses the same symlink method as Unix (called junction points).

    ReplyDelete
  4. I can confirm this method is working on Windows Server 2k8 with postgres 9.2. And pg admin is showing the good location.

    ReplyDelete