Friday, January 8, 2016

Configuration changes in 9.5: transaction log size

If you downloaded 9.5 after yesterday's release, you might have noticed some changes to postgresql.conf, especially if you copied over you favorite 9.4 config and it refused to start up.  Particularly, the parameter checkpoint_segments is gone, and has been replaced by min_wal_size and max_wal_size.  What does this mean?

Well, checkpoint_segments was the old way we had for users to determine how "big" the transaction log should be.  Users with a large server or a lot of transactions per second would set it high, and users with small VMs and a slow database could set it low.  This had some problems, though:

  1. The default setting worked for pretty much nobody, so you always had to adjust it.
  2. The WAL*  always used the maximum space available, even if it wasn't needed.
  3. Figuring out the largest size your WAL could be required some math and a knowledge of version-specific PostgreSQL internals.
The last was the most confusing part for users; the calculation for maximum WAL size was:

   ( ( checkpoint_segments * 2 ) + 1 )  * 16MB ) +  ( wal_keep_segments * 16MB )

... which meant that people generally sized it by picking an arbitrary number and then adjusting up or down based on feedback.

The new parameters are way simpler:
  • min_wal_size: the minimum size the transaction log will be;
  • max_wal_size: the maximum size the transaction log will be (but see below)
This means that your transaction log on disk shouldn't ever be larger than ( max_wal_size + wal_keep_segments ).  It is a "soft" limit though; if PostgreSQL gets really behind, or if archiving is failing, it will get higher than max_wal_size.

However, that isn't the really cool part.  Heikki did an amazing thing, in that the WAL is sized dynamically based on how much was used during the previous cycles.  So you can set max_wal_size to some high value (default is 1GB),  and not worry about PostgreSQL using a bunch of extra disk space if it's not needed.  This means that we can set a default which will be "good enough" for 80% of our users, and we have.  This makes me do a little happy dance.

The other 20% may want to tune, still though, so here's some guidelines:
  • if you know your database write traffic is "bursty", with long periods of inactivity followed by furious writes, increase min_wal_size;
  • if you do bulk loads larger than 1GB, increase max_wal_size to the size of a bulk load;
  • if you write more than 1GB of data every 10 minutes, increase max_wal_size;
  • if you get "checkpoints occurring to frequently" error messages, try increasing both parameters incrementally.
Most users won't need to touch those parameters at all, though.  Which is as it should be.

* WAL == Write Ahead Log == Transaction Log == XLog

Tuesday, January 5, 2016

New year, new job

If you've been following my writing online for the last year, you'll know that I've been pretty excited about Docker and other emerging Linux container technologies.  That's why I'm excited to announce my new position as Community Lead for Project Atomic at Red Hat. I'm really enthused about this, and I plan to make you as excited about Project Atomic and container tech as I am.

Does this mean I'm not working on PostgreSQL anymore?  Of course not. In fact, I'm already working on a PostgreSQL HA solution which relies on Atomic; you'll find out more about it at SCALE14 if you attend. 

It does mean I've stepped down as CEO of PostgreSQL Experts, which position is now occupied by my extremely capable former coworker and Django contributor, Christophe Pettus.  Since Christophe and Stacey Haysler have been effectively (and profitably!) running PGX for the last year, most of our customers won't even notice the change.

It also means that there will be lots more stuff about Linux containers on this blog, most of which will not be tagged with the PostgreSQL tag and thus not appear on Planet Postgres.  So if you're interesed in that, you'll want to follow my blog directly.

I'm also in the middle of moving to Portland, Oregon, so expect me to be rather unresponsive for the new two weeks.  After that ... on to new adventures!

Thursday, December 31, 2015

Speaker Practice Survey

VM Brasseur and I are doing a speaker training tutorial at SCALE14, entitled "10 Step Program for Great Tech Talks." As experienced speakers who go to a lot of conferences, we have way too much material for a three hour tutorial, and need to know what to cut.  So she had the idea of doing a survey of conference attendees, which we conducted over a couple weeks.  The survey was meant to find out what frequent conference attendees liked about good talks, and didn't like about not-so-good talks.

You can check out the original survey here, although we're no longer paying attention to responses.


First let's find out who our surveyees are. 108 people responded. Given that VM and I broadcasted the survey over Twitter, as did Josh Simmons and Gareth Greenaway, I expect that our sample will be fairly skewed; let's see.  First we asked people whether they just attend talks, or whether they both give and attend them.

As you can see, our sample skews heavily towards presenters, with a minority of pure audience members.  A few wiseacres said they only give talks, and don't attend them, which wasn't supported by their other answers. We also asked how many conferences and talks folks went to in the last year:

So, most of our respondees are frequent conference and/or talk attendees.  This colors the rest of the survey; what we're looking at is what a group of experienced people who go to a lot of talks, and present more than a few, think of other speakers' performance.  I suspect that if we did a survey of folks at their very first tech conference, we would see somewhat different data.

The Good

We asked "Thinking of the presentations you attended with topics and/or speakers you've most enjoyed, what speaker behaviors below do you feel substantially added to the experience?".  The idea of this question was to find out what skills or actions by the speaker were the most important for making a talk valuable.  Each item was rated on a 1-5 scale.

The above graph requires some explanation.  The length of the bars is the average score.  The bars are ranked from the highest to lowest rated practice.  The color of the bars indicates the median score, for cases where the average is deceptively skewed: Red 5, Blue 4, Green 3, and Black 2.

We can see a few things from this.  First, no practice was considered completely unimportant; nothing averaged below 2.1.  Good narrative structure was clearly the best liked practice, with deep subject knowledge and being an energetic speaker as our second and third ranked items.

If you look at the overall rankings, you can see that "content" items are the best loved, whereas talk "extras" and technical improvements are the least important.  Slide visual design, which many speakers spend a lot of time sweating over, just averages 3.0, and the three items involving technical interaction with the audience (demos, participation, and exercises), come in last.  I do wonder whether those three are last because they are so seldom done well, even by experienced speakers, or because they really are unimportant.  Let me know what you think in the comments.

The Bad

To balance this out, we asked surveyees, "Thinking of the presentations you attended with topics and/or speakers you liked, what speaker behaviors below do you feel substantially detracted from, or even ruined, an otherwise good presentation?" The idea was not to find out why sucky presentations sucked, but what things prevented an acceptable talk from being good.  Here's the ratings, which are graphed the same way as the "good" ratings:

The top two speaking problems were not being able to hear or understand the speaker, and the presenter being bored and/or distracted.  The first shows how important diction, projection, and AV systems are to having a good talk -- especially if you're speaking in a language which isn't your primary one.  The second is consistent with the good speaker ratings: people love energetic speakers, and if speakers are listless, so is the audience.  So make sure to shoot that triple espresso before you go on stage.

There were a few surprises in here for me.  Unreadable code on slides, one of my presonal pet peeves, was a middle-ranker.  Running over or under time, which I expected to be a major talk-ruiner, was not considered to be.  Offensive content, on the other hand, is a much more widespread problem than I would have assumed.

And "ums" and body language problems, however, ranked at the bottom.  From my own observation, I know that these are common issues, but apparently they don't bother people that much.  Or do they?

The Ugly

Finally, we asked, "If you could magically eliminate just one bad speaker behavior from all presenters everywhere, which one would it be?"  The idea was to find out what one thing was really driving audiences bananas, it was so bad.  Here's the responses:

So, first, there are a few things which are inconsistent with the ratings.  Unpracticed presentations were the #1 issue, and some other items which looked unimportant by rating, like saying "um" a lot, show up as a top pet peeve.  So apparently people hate "um" and interruptions a lot, but they don't see them ruining that many otherwise good talks.  The other top issues match the ratings we already saw.

About a seventh of the responses were ones which received only one or two votes, including an assortment of write-in responses.  Here's a few of the more interesting ones:
  • "Reading slide content to audience" (two votes, including a page-long tirade)
  • "Giving the talk using slides that are really optimized to be downloaded by someone who isn't able to attend the talk.  Presentation slides and downloadable slides are different beasts."
  • "Long personal company or employment history introductions"

On to the Tutorial

So this survey will give us enough information to know what things we can cut back on in order to make our timeslot ... and even a few things to spend more time on!  If you want to learn more about how to be a great speaker, join us at SCALE14 on the morning of Thursday, January 21.

If you just like the graphs, both the graphs and the IPython notebook used to produce them are on our Github repo.

Wednesday, December 23, 2015

A Christmas present: PostgreSQL 9.5 RC1

It's been much-delayed, but PostgreSQL 9.5 RC1 is finally available.  This release contains not only fixes for all known issues in 9.5 features, it also contains an "ultimate solution" to multixact truncation handling.  Multixacts were the cause of multiple bugfix releases in versions 9.3 and 9.4, and several hackers have taken the opportunity in 9.5 to implement a new design for this.  This has been a large part of the delay in releasing 9.5.

So: if you plan to upgrade to 9.5, download and test now.  Final will be released in early January, barring discovery of new ghastly bugs in the RC.

To make it easier for you to test, I've updated the postgres95-test Docker image, ready for you to download and try out whatever you want.

9.5 brings us a lot of cool features which are likely to justify upgrading for you:
  • Row Level Security
  • Grouping Sets/CUBE/ROLLUP
  • Faster sorts for text
  • FDW push-down and partitioning support
  • BRIN indexes
If you want to read up on the features so you know what to test:

Tuesday, December 8, 2015

Meet the newest member of the PostgreSQL community: MongoDB, Inc.

Well, this story is better told by the guy who broke it, John De Goes.

The TL;DR is that MongoDB will be shipping PostgreSQL as its "legacy BI connector" in version 3.2, using PostgreSQL+Multicorn FDW as a wrapper to connect SQL-based BI systems to Mongo.

I have two thoughts about this:
  1. Welcome to the PostgreSQL community, MongoDB staff!
  2. Multicorn?  I mean, not that Multicorn isn't great, but don't you think you should be using a C-based FDW driver for performance?
Anyway, hopefully the folks at MongoDB will put in a talk at pgNYC about this.  We're always keen to hear about creative uses of FDWs, and if this isn't one, I don't know what is.

PostgreSQL, The Center Of Your Dataverse™

(even if you're using Mongo)

Thursday, November 12, 2015

PostgreSQL link round-up

First, in case you somehow missed it, PostgreSQL 9.5 Beta 2 is now out.  Time for another round of testing!  There's fewer and fewer bugs found, so we're not far from a final release.  I don't know about anyone else, but we're going into production on Beta 2 in a couple places.  Can't wait any longer for Upsert and RLS.

Second, pgConfSV has announced its keynotes, from CitusData, Pivotal, and -- as a bit of a surprise -- from streaming data thing Kafka (before you ask, I didn't pick the keynotes). I believe registration is still open, so you can still go to the conference next week if you act quickly.

Thirdly, I have a roundup of some cool blogs covering PostgreSQL which aren't on Planet and you may have missed: did a terrific two-part article on why PostgreSQL is the best open source database.  Read it here: Part I  Part II

They also covered using JSON in Postgres with Python.

In stranger news, there's an amazingly strange story about Soylent, PostgreSQL, and auto-responding to poisoning accusations.  Yes, Soylent Inc. uses PostgreSQL, why not?  Read the whole weird tale here on Zapier's blog.

That's it for my round up ... if you have some good links, post them in the comments.

Tuesday, November 10, 2015

Ready to be an Accidental DBA at pgConfSV?

My AccidentalDBA tutorial materials have been updated for pgConfSV next week.  If you are planning to take this tutorial, please read and follow the setup instructions before you get to the conference.

I created this tutorial in 2009 because of the number of people these days who find themselves in charge of a PostgreSQL server ... or many servers ... without any DBA background, training, or desire to be ops.  The tutorial is intended to teach you the minimum you need to keep your PostgreSQL server from falling over so you can get back to whatever your main job is.

You can take it self-paced if you can't make it to pgConfSV.  The tutorial includes all materials and notes.  In this version, I've added the option of a Docker container (preferred), pg_stat_statements, and a restore-to-point-in-time exercise.

If you are going to attend this tutorial, note that it requires advance setup which will be harder to perform once you get to your hotel, and near-impossible if you wait until 10 minutes before it starts.  So read up and install now.

And ... I believe there's still a few registrations left for the tutorial day, so consider signing up.  You have nothing to lose but your fear of database admin!