Monday, January 15, 2018

Last Week in Kubernetes: Week ending January 14th

With several hundred active contributors, it's pretty hard to keep track of Kubernetes development.   It's hard for me, and I'm paid to keep track; I can't imagine that anyone else can do it, even if they contribute to Kubernetes.

What follows is an experimental publication.  I'm thinking of doing a development summary, every week or so, of what's happened in new features, deprecations, the community meeting, and more.  Tell me if this is useful to you.  If it is, I'll look at finding an official place to publish it where maybe other community members can contribute.

Last Week in Kubernetes: Week ending January 14, 2017

Community Meeting Summary

The community meeting was dominated by a discussion around whether all repos in the kubernetes namespace should be a part of the same automation, particularly merge automation.  Aaron Crickenberger (spiffxp) has been offering this to other repos in the Kubernetes namespace, but some teams, particularly Helm, are concerned about unexpected changes this might cause.  One goal of getting all repos on the same automation is to retire mungegithub.

Jacob Pavlik demonstrated the KQueen cluster manager.  Jaice DuMars went over release 1.10, which is in week 2 of 12 and will go into Feature Freeze on January 22nd, so get your features in!  And SIG Azure and SIG Node made reports.

Feature Work

Configurable Pod Process Namespace Sharing prepared for inclusion in 1.10 this week with the addition of a feature flag for PID namespace sharing. The --docker-disable-shared-pid was also removed from kubelet.

Kubelets can now be run in containers, allowing for a completely containerized Kubernetes install. Such installs are now passing e2e tests.

Support for raw block devices as persistent volumes moved ahead with the merge of iSCSI support for block volumes.


Docker 1.10 is no longer supported. The minimum docker version is now 1.11. While docker 1.10 was officially deprecated in release 1.9, the compatibility code has now actually been completely removed.

sig-cluster-lifecycle is gradually deprecating the /cluster directory in favor of having these cluster setup tools maintained outside of kubernetes/kubernetes. In 1.10, that will include removing the windows/, photon-controller/, libvirt-coreos/, and gce/container-linux/ subdirectories, with more to be removed in future releases.

Version Updates


Other Merges

Wednesday, January 11, 2017

Retiring from the Core Team

Those of you in the PostgreSQL community will have noticed that I haven't been very active for the past year.  My new work on Linux containers and Kubernetes has been even more absorbing than I anticipated, and I just haven't had a lot of time for PostgreSQL work.

For that reason, as of today, I am stepping down from the PostgreSQL Core Team.

I joined the PostgreSQL Core Team in 2003.  I decided to take on project advocacy, with the goal of making PostgreSQL one of the top three databases in the world.  Thanks to the many contributions by both advocacy volunteers and developers -- as well as the efforts by companies like EnterpriseDB and Heroku -- we've achieved that goal.  Along the way, we proved that community ownership of an OSS project can compete with, and ultimately outlast, venture-funded startups.

Now we need new leadership who can take PostgreSQL to the next phase of world domination.  So I am joining Vadim, Jan, Thomas, and Marc in clearing the way for others.

I'll still be around and still contributing to PostgreSQL in various ways, mostly around running the database in container clouds.  It'll take a while for me to hand off all of my PR responsibilities for the project (assuming that I ever hand all of them off).

It's been a long, fun ride, and I'm proud of the PostgreSQL we have today: both the database, and the community.  Thank you for sharing it with me.

Wednesday, May 18, 2016

Changing PostgreSQL Version Numbering

Per yesterday's developer meeting, the PostgreSQL Project is contemplating a change to how we do version numbers.  First, let me explain how we do version numbers now.  Our current version number composition is:

9 . 5 . 3 
Major1 . Major2 . Minor

That is, the second number is the "major version" number, reflecting our annual release.  The third number is the update release number, reflecting cumulative patch releases.  Therefore "9.5.3" is the third update to to version 9.5.

The problem is the first number, in that we have no clear criteria when to advance it.  Historically, we've advanced it because of major milestones in feature development: crash-proofing for 7.0, Windows port for 8.0, and in-core replication for 9.0.  However, as PostgreSQL's feature set matures, it has become less and less clear on what milestones would be considered "first digit" releases.  The result is arguments about version numbering on the mailing lists every year which waste time and irritate developers.

As a result, the PostgreSQL Project is proposing a version numbering change, to the following:

10 . 2
Major . Minor

Thus "10.2" would be the second update release for major version 10.   The version we release in 2017 would be "10" (instead of 10.0), and the version we release in 2018 will be "11".

The "sortable" version number available from the server, libpq, and elsewhere would remain the same six digits, zero-filled in the middle.  So 10.2 would be 100002.

The idea is that this will both put an end to the annual arguments, as well as ending the need to explain to users that 9.5 to 9.6 is really a major version upgrade requiring downtime.

Obviously, there is potential for breakage of a lot of tools, scripts, automation, packaging and more in this.  That's one reason we're discussing this now, almost a year before 10 beta is due to come out.

The reason for this blog post is that I'm looking for feedback on what this version number change will break for you.  Particularly, I want to hear from driver authors, automation engineers, cloud owners, application stack owners, and other folks who are "downstream" of PostgreSQL.  Please let us know what technical problems this will cause for you, and how difficult it will be to resolve them in the next nine months.

We are not, at this point, interested in comments on how you feel about the version change or alternate version naming schemes.  That discussion has already happened, at length.  You can read it here, here, and here, as well as at the developer meeting.

Places to provide feedback:

Thanks for any feedback you can provide.

Note that the next release of PostgreSQL, due later this year, will be "9.6" regardless.  We're deciding what we do after that.

Thursday, April 28, 2016

Don't delete pg_xlog

This StackOverflow question reminded me of this old blog post, which is still relevant today:

pg_log, pg_xlog and pg_clog

There are three directories in a default $PGDATA directory when you create it which are named "pg_*log".


$PGDATA/pg_log is the default location for the database activity logs, which include error messages, query logging, and startup/shutdown messages.  This is where you should first look for information when PostgreSQL won't start.  Many Linux distributions and other packaging systems relocate this log directory to somewhere like /var/log/postgresql.

You can freely delete, rename, compress, and move files in pg_log without penalty, as long as the postgres user still has rights to write to the directory. If pg_log becomes bloated with many large files, you probably need to decrease the number of things you're logging by changing the settings in postgresql.conf.

Do note that if you "delete" the current log file on a Linux or Unix system, it may remain open but not accessible, just sending any successive log messages to /dev/null until the file rotates.


$PGDATA/pg_xlog is the PostgreSQL transaction log.  This set of binary log files, with names like '00000001000000000000008E', contain images of the data from recent transactions.  These logs are also used for binary replication.

If replication, archiving, or PITR is failing, this directory can become bloated with gigabytes of logs the database server is saving for when archiving resumes. This can cause you to run out of disk space
Unlike pg_log, you may not freely delete, move, or compress files in this directory.  You may not even move the directory without symlinking it back to its original location.  Deleting pg_xlog files may result in unrecoverable database corruption.

If you find yourself in a situation where you've got 100GB of files in pg_xlog and the database won't start, and you've already disabled archiving/replication and tried clearing disk space every other way, then please take two steps:
  1. Move files from pg_xlog to a backup disk or shared network drive, don't delete them, and
  2. Move only a few of the oldest files, enough to allow PostgreSQL to start again.


$PGDATA/pg_clog contains a log of transaction metadata.   This log tells PostgreSQL which transactions completed and which did not.  The clog is small and never has any reason to become bloated, so you should never have any reason to touch it.

Should you ever delete files from pg_clog, you might as well delete the entire database directory. There is no recovery from a missing clog.

Note that this means, if you back up the files in a $PGDATA directory, you should make sure to include the pg_clog and pg_xlog as well, or you may find that your backup is not usable.

Tuesday, April 26, 2016

Join us for the 3rd pgCon User Unconference

This year, we're continuing to experiment with new formats for the pgCon unconference.  In 2013 and 2014 we had an Unconference on the Saturday of pgCon.  In 2015 we had a limited Developer Unconference on Wednesday.

This year, we will have a Developer Unconference on Wednesday, and a User Unconference on Saturday.  We're doing this because people were disappointed that we didn't do the User Unconference last year, and asked us to bring it back.  So, hopefully you planned to stay over Saturday!

The User Unconference has several purposes:

  • to give various teams and special interest groups an opportunity to schedule something
  • to let folks whose technology was released too late for the CfP another chance to present something
  • to continue discussions started around talks in the main program
So, please join us!  And if you have ideas for User Unconference sessions which you want to make sure get on the program, please list them on the wiki page using the template provided.  Note that final sessions will be chosen at 10am Saturday morning, though.

Thursday, March 31, 2016

9.5.2 update release and corrupt indexes

We've released an off-schedule update release today, because of a bug in one of 9.5's features which has forced us to partially disable the feature.  This is, obviously, not the sort of thing we do lightly.

One of the performance features in 9.5 was an optimization which speeded up sorts across the board for text and numeric values, contributed by Peter Geoghegan.  This was an awesome feature which speeded up sorts across the board by 50% to 2000%, and since databases do a lot of sorting, was an overall speed increase for PostgreSQL.  It was especially effective in speeding up index builds.

That feature depends on a built-in function in glibc, strxfrm(), which could be used to create a sortable hash of strings.  Now, the POSIX standard says that strxfrm() + strcmp() should produce sorting results identical to the strcoll() function.  And in our general tests, it did.

However, there are dozens of versions of glibc in the field, and hundreds of collations, and it's computationally unreasonable to test all combinations.  Which is how we missed the problem until a user reported it.  It turns out that for certain releases of glibc (particularly anything before 2.22 on Linux or BSD), with certain collations, strxfrm() and strcoll() return different results due to bugs.  Which can result in an index lookup failing to find rows which are actually there.  In the bug report, for example, an index on a German text column on RedHat Enterprise Linux 6.5 would fail to find many rows in a "between" search.

As a result, we've disabled the feature in 9.5.2 for all indexes which are on collations other than the simplified "C" collation.  This sucks.

Also, if you're on 9.5.0 or 9.5.1 and you have indexes on columns with real collations (i.e. not "C" collation), then you should REINDEX (or CREATE CONCURRENTLY + DROP CONCURRENTLY) each of those indexes.  Which really sucks.

Of course we're discussing ways to bring back the feature, but nobody has a solution yet. In the meantime, you can read more about the problem on the wiki page.

Wednesday, February 24, 2016

Stupid Docker Tricks: running Firefox in a container

Once you start messing around with desktop containers, you end up doing a lot of things "because you can" and not because they're a particularly good idea.  For example, I need to run multiple Firefox instances under different profiles in order to maintain various social media accounts, such as the @projectatomic twitter feed.  Now, I could do that by launching Firefox with various -P flags, but that would be no fun at all.  Instead, I'm going to launch Firefox in a container.

Mind you, if you're a webdev or a desktop hacker, this would be a good way to launch various hacked versions of Firefox without messing with the browser you need every day.  There's four basic steps here:

  1. build a Firefox image
  2. authorize X11 connections from containers
  3. enable Firefox connections in SELinux
  4. run the container
First, build the Firefox image.  This is fairly standard except that you need to tailor it to the UID and GID of your desktop user so that you don't have to jump through a lot of hoops to get SELinux to authorize connecting from the container to your desktop X server.  I use a Dockerfile like this one:

    FROM fedora
    # install firefox
    RUN dnf install -y firefox
    # install dependancies
    RUN dnf install -y libcanberra-gtk3 PackageKit-gtk3-module \
        dbus dbus-devel dbus-x11
    RUN dbus-uuidgen --ensure

    # make uid and gid match inside and outside the container
    # replace 1000 with your gid/uid, find them by running
    # the id command
    RUN export uid=1000 gid=1000 && \
        mkdir -p /home/firefox && \
        echo "firefox:x:${uid}:${gid}:Developer,,,:/home/firefox:/bin/bash" >> /etc/passwd  && \
        echo "firefox:x:${uid}:" >> /etc/group  && \
        echo "firefox ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers  && \
        chmod 0440 /etc/sudoers  && \
        chown ${uid}:${gid} -R /home/firefox

    #remove cache from the image to shrink it a bit
    RUN dnf clean all

    # set up and run firefox
    USER firefox
    ENV HOME /home/firefox
    CMD /usr/bin/firefox -no-remote

Then you can build your image by running:

    docker build -t username/firefox .

Next we need to make sure that the docker container is allowed to run X11 apps on your desktop machine, so that Firefox can run inside the container but be displayed on your desktop.  This is a simple command, allowing anyone on localhost to run X apps:

    xhost +

Thirdly we'll need to also make that work with SELinux.  The simplest way to do this is to try it, have SELinux block, and then enable it.  So try launching the Firefox container with the command in the step below.  It should fail with some kind of "could not connect to display" error.  Then run these commands, as root:

    grep firefox /var/log/audit/audit.log | audit2allow -M mypol
    semodule -i mypol.pp    

Finally, your Firefox container should be ready to go.  Except you need to add some flags, due to the need to share the X11 socket between the container and the desktop.  Here's what I use:

    docker run -it -e DISPLAY --net=host jberkus/firefox

This should bring up Firefox in a window on your desktop, under a profile and cache which exists only in the container.   If you want to always dispose of this Firefox without saving anything, add an --rm flag to the above.

If you don't want to paste all of the above from a blog (and really, who does?) I've put up some scripts on Github.

Now, if only I could figure out why fonts aren't rendering correctly in Firefox run this way.  Ideas?