Wednesday, May 18, 2016

Changing PostgreSQL Version Numbering

Per yesterday's developer meeting, the PostgreSQL Project is contemplating a change to how we do version numbers.  First, let me explain how we do version numbers now.  Our current version number composition is:

9 . 5 . 3 
Major1 . Major2 . Minor

That is, the second number is the "major version" number, reflecting our annual release.  The third number is the update release number, reflecting cumulative patch releases.  Therefore "9.5.3" is the third update to to version 9.5.

The problem is the first number, in that we have no clear criteria when to advance it.  Historically, we've advanced it because of major milestones in feature development: crash-proofing for 7.0, Windows port for 8.0, and in-core replication for 9.0.  However, as PostgreSQL's feature set matures, it has become less and less clear on what milestones would be considered "first digit" releases.  The result is arguments about version numbering on the mailing lists every year which waste time and irritate developers.

As a result, the PostgreSQL Project is proposing a version numbering change, to the following:

10 . 2
Major . Minor

Thus "10.2" would be the second update release for major version 10.   The version we release in 2017 would be "10" (instead of 10.0), and the version we release in 2018 will be "11".

The "sortable" version number available from the server, libpq, and elsewhere would remain the same six digits, zero-filled in the middle.  So 10.2 would be 100002.

The idea is that this will both put an end to the annual arguments, as well as ending the need to explain to users that 9.5 to 9.6 is really a major version upgrade requiring downtime.

Obviously, there is potential for breakage of a lot of tools, scripts, automation, packaging and more in this.  That's one reason we're discussing this now, almost a year before 10 beta is due to come out.

The reason for this blog post is that I'm looking for feedback on what this version number change will break for you.  Particularly, I want to hear from driver authors, automation engineers, cloud owners, application stack owners, and other folks who are "downstream" of PostgreSQL.  Please let us know what technical problems this will cause for you, and how difficult it will be to resolve them in the next nine months.

We are not, at this point, interested in comments on how you feel about the version change or alternate version naming schemes.  That discussion has already happened, at length.  You can read it here, here, and here, as well as at the developer meeting.

Places to provide feedback:

Thanks for any feedback you can provide.

Note that the next release of PostgreSQL, due later this year, will be "9.6" regardless.  We're deciding what we do after that.

Thursday, April 28, 2016

Don't delete pg_xlog

This StackOverflow question reminded me of this old blog post, which is still relevant today:

pg_log, pg_xlog and pg_clog

There are three directories in a default $PGDATA directory when you create it which are named "pg_*log".


$PGDATA/pg_log is the default location for the database activity logs, which include error messages, query logging, and startup/shutdown messages.  This is where you should first look for information when PostgreSQL won't start.  Many Linux distributions and other packaging systems relocate this log directory to somewhere like /var/log/postgresql.

You can freely delete, rename, compress, and move files in pg_log without penalty, as long as the postgres user still has rights to write to the directory. If pg_log becomes bloated with many large files, you probably need to decrease the number of things you're logging by changing the settings in postgresql.conf.

Do note that if you "delete" the current log file on a Linux or Unix system, it may remain open but not accessible, just sending any successive log messages to /dev/null until the file rotates.


$PGDATA/pg_xlog is the PostgreSQL transaction log.  This set of binary log files, with names like '00000001000000000000008E', contain images of the data from recent transactions.  These logs are also used for binary replication.

If replication, archiving, or PITR is failing, this directory can become bloated with gigabytes of logs the database server is saving for when archiving resumes. This can cause you to run out of disk space
Unlike pg_log, you may not freely delete, move, or compress files in this directory.  You may not even move the directory without symlinking it back to its original location.  Deleting pg_xlog files may result in unrecoverable database corruption.

If you find yourself in a situation where you've got 100GB of files in pg_xlog and the database won't start, and you've already disabled archiving/replication and tried clearing disk space every other way, then please take two steps:
  1. Move files from pg_xlog to a backup disk or shared network drive, don't delete them, and
  2. Move only a few of the oldest files, enough to allow PostgreSQL to start again.


$PGDATA/pg_clog contains a log of transaction metadata.   This log tells PostgreSQL which transactions completed and which did not.  The clog is small and never has any reason to become bloated, so you should never have any reason to touch it.

Should you ever delete files from pg_clog, you might as well delete the entire database directory. There is no recovery from a missing clog.

Note that this means, if you back up the files in a $PGDATA directory, you should make sure to include the pg_clog and pg_xlog as well, or you may find that your backup is not usable.

Tuesday, April 26, 2016

Join us for the 3rd pgCon User Unconference

This year, we're continuing to experiment with new formats for the pgCon unconference.  In 2013 and 2014 we had an Unconference on the Saturday of pgCon.  In 2015 we had a limited Developer Unconference on Wednesday.

This year, we will have a Developer Unconference on Wednesday, and a User Unconference on Saturday.  We're doing this because people were disappointed that we didn't do the User Unconference last year, and asked us to bring it back.  So, hopefully you planned to stay over Saturday!

The User Unconference has several purposes:

  • to give various teams and special interest groups an opportunity to schedule something
  • to let folks whose technology was released too late for the CfP another chance to present something
  • to continue discussions started around talks in the main program
So, please join us!  And if you have ideas for User Unconference sessions which you want to make sure get on the program, please list them on the wiki page using the template provided.  Note that final sessions will be chosen at 10am Saturday morning, though.

Thursday, March 31, 2016

9.5.2 update release and corrupt indexes

We've released an off-schedule update release today, because of a bug in one of 9.5's features which has forced us to partially disable the feature.  This is, obviously, not the sort of thing we do lightly.

One of the performance features in 9.5 was an optimization which speeded up sorts across the board for text and numeric values, contributed by Peter Geoghegan.  This was an awesome feature which speeded up sorts across the board by 50% to 2000%, and since databases do a lot of sorting, was an overall speed increase for PostgreSQL.  It was especially effective in speeding up index builds.

That feature depends on a built-in function in glibc, strxfrm(), which could be used to create a sortable hash of strings.  Now, the POSIX standard says that strxfrm() + strcmp() should produce sorting results identical to the strcoll() function.  And in our general tests, it did.

However, there are dozens of versions of glibc in the field, and hundreds of collations, and it's computationally unreasonable to test all combinations.  Which is how we missed the problem until a user reported it.  It turns out that for certain releases of glibc (particularly anything before 2.22 on Linux or BSD), with certain collations, strxfrm() and strcoll() return different results due to bugs.  Which can result in an index lookup failing to find rows which are actually there.  In the bug report, for example, an index on a German text column on RedHat Enterprise Linux 6.5 would fail to find many rows in a "between" search.

As a result, we've disabled the feature in 9.5.2 for all indexes which are on collations other than the simplified "C" collation.  This sucks.

Also, if you're on 9.5.0 or 9.5.1 and you have indexes on columns with real collations (i.e. not "C" collation), then you should REINDEX (or CREATE CONCURRENTLY + DROP CONCURRENTLY) each of those indexes.  Which really sucks.

Of course we're discussing ways to bring back the feature, but nobody has a solution yet. In the meantime, you can read more about the problem on the wiki page.

Wednesday, February 24, 2016

Stupid Docker Tricks: running Firefox in a container

Once you start messing around with desktop containers, you end up doing a lot of things "because you can" and not because they're a particularly good idea.  For example, I need to run multiple Firefox instances under different profiles in order to maintain various social media accounts, such as the @projectatomic twitter feed.  Now, I could do that by launching Firefox with various -P flags, but that would be no fun at all.  Instead, I'm going to launch Firefox in a container.

Mind you, if you're a webdev or a desktop hacker, this would be a good way to launch various hacked versions of Firefox without messing with the browser you need every day.  There's four basic steps here:

  1. build a Firefox image
  2. authorize X11 connections from containers
  3. enable Firefox connections in SELinux
  4. run the container
First, build the Firefox image.  This is fairly standard except that you need to tailor it to the UID and GID of your desktop user so that you don't have to jump through a lot of hoops to get SELinux to authorize connecting from the container to your desktop X server.  I use a Dockerfile like this one:

    FROM fedora
    # install firefox
    RUN dnf install -y firefox
    # install dependancies
    RUN dnf install -y libcanberra-gtk3 PackageKit-gtk3-module \
        dbus dbus-devel dbus-x11
    RUN dbus-uuidgen --ensure

    # make uid and gid match inside and outside the container
    # replace 1000 with your gid/uid, find them by running
    # the id command
    RUN export uid=1000 gid=1000 && \
        mkdir -p /home/firefox && \
        echo "firefox:x:${uid}:${gid}:Developer,,,:/home/firefox:/bin/bash" >> /etc/passwd  && \
        echo "firefox:x:${uid}:" >> /etc/group  && \
        echo "firefox ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers  && \
        chmod 0440 /etc/sudoers  && \
        chown ${uid}:${gid} -R /home/firefox

    #remove cache from the image to shrink it a bit
    RUN dnf clean all

    # set up and run firefox
    USER firefox
    ENV HOME /home/firefox
    CMD /usr/bin/firefox -no-remote

Then you can build your image by running:

    docker build -t username/firefox .

Next we need to make sure that the docker container is allowed to run X11 apps on your desktop machine, so that Firefox can run inside the container but be displayed on your desktop.  This is a simple command, allowing anyone on localhost to run X apps:

    xhost +

Thirdly we'll need to also make that work with SELinux.  The simplest way to do this is to try it, have SELinux block, and then enable it.  So try launching the Firefox container with the command in the step below.  It should fail with some kind of "could not connect to display" error.  Then run these commands, as root:

    grep firefox /var/log/audit/audit.log | audit2allow -M mypol
    semodule -i mypol.pp    

Finally, your Firefox container should be ready to go.  Except you need to add some flags, due to the need to share the X11 socket between the container and the desktop.  Here's what I use:

    docker run -it -e DISPLAY --net=host jberkus/firefox

This should bring up Firefox in a window on your desktop, under a profile and cache which exists only in the container.   If you want to always dispose of this Firefox without saving anything, add an --rm flag to the above.

If you don't want to paste all of the above from a blog (and really, who does?) I've put up some scripts on Github.

Now, if only I could figure out why fonts aren't rendering correctly in Firefox run this way.  Ideas?

Friday, February 19, 2016

JSONB, PostgreSQL and Go

Just ran across this excellent pair of blog posts on using JSONB with Go application design to simplify your data model.  Since that blog is not syndicated on Planet Postgres, reblogging it so that more people see it.

If you find other great PostgreSQL content on the web, make sure folks know about it.

Friday, January 8, 2016

Configuration changes in 9.5: transaction log size

If you downloaded 9.5 after yesterday's release, you might have noticed some changes to postgresql.conf, especially if you copied over you favorite 9.4 config and it refused to start up.  Particularly, the parameter checkpoint_segments is gone, and has been replaced by min_wal_size and max_wal_size.  What does this mean?

Well, checkpoint_segments was the old way we had for users to determine how "big" the transaction log should be.  Users with a large server or a lot of transactions per second would set it high, and users with small VMs and a slow database could set it low.  This had some problems, though:

  1. The default setting worked for pretty much nobody, so you always had to adjust it.
  2. The WAL*  always used the maximum space available, even if it wasn't needed.
  3. Figuring out the largest size your WAL could be required some math and a knowledge of version-specific PostgreSQL internals.
The last was the most confusing part for users; the calculation for maximum WAL size was:

   ( ( checkpoint_segments * 2 ) + 1 )  * 16MB ) +  ( wal_keep_segments * 16MB )

... which meant that people generally sized it by picking an arbitrary number and then adjusting up or down based on feedback.

The new parameters are way simpler:
  • min_wal_size: the minimum size the transaction log will be;
  • max_wal_size: the maximum size the transaction log will be (but see below)
This means that your transaction log on disk shouldn't ever be larger than ( max_wal_size + wal_keep_segments ).  It is a "soft" limit though; if PostgreSQL gets really behind, or if archiving is failing, it will get higher than max_wal_size.

However, that isn't the really cool part.  Heikki did an amazing thing, in that the WAL is sized dynamically based on how much was used during the previous cycles.  So you can set max_wal_size to some high value (default is 1GB),  and not worry about PostgreSQL using a bunch of extra disk space if it's not needed.  This means that we can set a default which will be "good enough" for 80% of our users, and we have.  This makes me do a little happy dance.

The other 20% may want to tune, still though, so here's some guidelines:
  • if you know your database write traffic is "bursty", with long periods of inactivity followed by furious writes, increase min_wal_size;
  • if you do bulk loads larger than 1GB, increase max_wal_size to the size of a bulk load;
  • if you write more than 1GB of data every 10 minutes, increase max_wal_size;
  • if you get "checkpoints occurring to frequently" error messages, try increasing both parameters incrementally.
Most users won't need to touch those parameters at all, though.  Which is as it should be.

* WAL == Write Ahead Log == Transaction Log == XLog