Tuesday, August 26, 2014

Portland schedule, revised

Just so folks know:  due to unsurmountable issues, I will not make it to DjangoCon this year.  My apologies for anyone who was planning on attending my sessions there.  However, I will be at both the Portland pgDay and FOSS4G.  See you in Portland, just later.

Monday, August 25, 2014

The great JSONB tradeoff

One thing we're waiting on to get 9.4 out the door is making a decision on JSONB storage.  Currently under debate are patches to the JSONB format which would, for users with large numbers (as in 150+) of top-level keys, cause JSONB to take up to 60% less space on disk in exchange for an up to 80% increase (that is slowdown) in response times on key extraction.  We need your feedback as potential users of the feature to make a good decision about what's best.

At stake is the question of whether the key locations should be saved as successive offsets, or as lengths.  The advantage of the former is that it speeds up extraction by making it only two lookups to locate a top-level key regardless of the number of keys you have.  The disadvantage is that a series of increasing offsets isn't very compressible.

For JSONB fields which consist of a few top-level keys and large values, this question makes no difference at all.  However, for the worst case ... 150+ top-level keys with short (under 10 bytes) values, the difference is quite dramatic.  For example, I constructed a test with 183 keys, of which 175 were NUMERIC.  I checked both table size and time to extract key #160 from 100,000 rows:

MetricOffsetsLengths
Table Size1147 MB541 MB
Extract 100K2.8s5.2s

This is a "worst case" scenario for the difference between these two designs.  Note that the extraction slowdown affects only retrieving the value to the client; it does not affect index lookups of JSONB rows, which are speedy no matter which patch is employed.

However, we're undecided on this "fix" because we don't know a couple things:

  • How likely are users to have 150+ top-level keys in one field (or keys on any single level together) with short values?
  • Is up to 60% space savings in return for up to 80% extraction slowdown a good tradeoff?  Or a bad one?
As such, I am turning to you, the users, to help us decide what tradeoff makes sense for 9.4.  Let me know in the comments.

Thursday, August 14, 2014

More on 9.4 percentiles

Some of you may be following the discussion on JSONB compression on -hackers, which is the remaining major issue for 9.4.  As part of this, I needed to run some statistics on average column sizes for JSON vs JSONB tables. This makes a good example of how incredibly useful WITHIN GROUP will be in 9.4.

First, I created two versions of the data, one called "jsonic" which has text-JSON data, and one called "jsonbish" which has the JSONB version.  Then I ran some comparisons using pg_column_size on the JSON columns.

with colsizes_b as (
    select pg_column_size(jsonbish.jcol) as colsize
    from jsonbish
),
distrib_b as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_b
),
colsizes_j as (
    select pg_column_size(jsonic.jcol) as colsize
    from jsonic
),
distrib_j as (
    select percentile_cont(array [0,0.25,0.5,0.75,1])
    within group (order by colsize) as colsize_distribution
    from colsizes_j
)
select 'json' as thetype,
    colsize_distribution from distrib_j
union all
select 'jsonb',
    colsize_distribution
    from distrib_b;


So I'm taking the column size of each row, then sorting them by size, and then doing a percentile distribution using the new WITHIN GROUP.  There's no group there because the group is actually the whole table.  I've chosen the usual box plot percentages: minimum, 25%, median, 75%, and maximum.

And the results:

 thetype |    colsize_distribution   
---------+----------------------------
 json    | {1741,1767,1854,1904,2292}
 jsonb   | {3551,5866,5910,5958,6168}


... which demonstrates that we do indeed have a major issue with compressability.




Tuesday, August 12, 2014

SFPUG Video on YouTube

TL;DR: SFPUG video available on YouTube, including streaming video tonight.  I need help converting old videos and making new ones.

First, we have a few SFPUG videos available on YouTube:
While JSONB Deep Dive was also recorded, there were issues with the audio, so I don't recommend watching it, sorry.

We will have streaming video for tonight's SFPUG, which will be on Postgres-XL.   Join the Google Event to get a link to the streaming video once it starts; expected start time is 7:15PM PST.

This brings me to the third thing, which is that I could use some help with video, in two areas:
  1. I desperately need someone to help take better-quality videos of SFPUG meetings, with better focus and sound.  Right now I'm using a Logitech webcam, and it's just not cutting it.  Video will need to be compatible with Google Hangouts, unless you have another plan for broadcasting.
  2. I have 7 or 8 videos of SFPUG meetings from 2013, including Tom Lane explaining the query planner, in FLV format.  I need someone to transcode these to a YouTube format and upload them.
If you can help, email me.  Thanks!

Thursday, August 7, 2014

Call for quotes for the 9.4 release announcement

We are looking for endorsement quotes for the 9.4 release.
Specifically, we want quotes from end-users or ISVs* who have not been
featured for a previous release, and enthuse about one of the following
topics:
  • JSONB
  • Streaming Changesets/Bi-Directional Replication
  • Performance improvements in 9.4
  • Something else about Flexibility, Scalability or Performance
We need these quotes by August 15th.  Contact me if you want to offer a quote.  Thanks!

(* while we love our PostgreSQL consulting and support companies,
endorsements from them aren't news in a PR sense)

Friday, June 13, 2014

Pending Portland Postgres Presentations

This year is apparently my year to spend most of my time in Portland.  Maybe I should move there ;-)  Please read through if you are in Portland or are attending conferences in Portland for a list PostgreSQL-related events there.

O'Reilly OSCON 2014: on Monday, July 21st, I will be presenting The Accidental DBA again at OSCON.  O'Reilly has asked me to reprise this tutorial as it was sold out last year.  This tutorial covers the care and feeding of a PostgreSQL server for people whose main job is something else, or for those new to database administration.  Will include multiple hands-on exercises using Vagrant on your own laptop.  The exercises are on github, but will be updated on or before July 10th in order to cover 9.3 and feedback from the last talk.

Note that there will also be a special PDXPUG meeting or BOF at OSCON; details TBD.  I will also be doing an Office Hours session at OSCON on July 22nd at 3:20 PM; please bring your questions about PostgreSQL scalability and replication.

I am likely to be speaking at DjangoCon.us in September.  As the Call for Presentations is still open, I don't know what it will be about, or have confirmation that I will, in fact, be speaking.

Right between DjangoCon.us and FOSS4G, PDXPUG will be hosting a full PostgreSQL Day at Portland State University on September 6.  This event is free, so if you are going to either DjangoCon or FOSS4G, extend your stay in Portland in order to attend!  RSVP required.

People are still submitting presentations for that; if you're interested in speaking, contact Mark per the wiki page.  I expect to be talking about 9.4, replication, or maybe Postgres on Amazon.

Finally, at FOSS4G, I will be doing a full day of Postgres/PostGIS workshops.  In the morning I will do Accidental DBA again, this time tailored for PostGIS administrators.  In the afternoon, I will do a hands-on "learning to do PostGIS replication" tutorial.  This does mean that I will likely not include much about replication in the morning session.  Both of these workshops are "bring your own device", which means that you will need to install the tutorial materials on your laptop in advance.

Since I will be spending the first half of September in Portland.  If you or your company is interested in onsite training or consulting during September, please get in touch with PGX.

Tuesday, June 3, 2014

Upcoming SFPUG Live Streams: Crontabber and JSONB

Apparently June is the month for hackers named Peter with difficult-to-spell last names.  Join us for two live streams of two different Peters giving two different PostgreSQL presentations.

First, on June 10th, we'll have Peter Bengtsson of Mozilla presenting Crontabber, a network-wide scheduled job manager written in Python and PostgreSQL.  Tune in on Air Mozilla at around 7:15pm PDT to watch this.

Second, on June 23rd, Peter Geoghegan of Heroku will do a "JSONB Deep Dive" and explain the features and internals of the new JSONB type.   Subscribe to the Google Event to be linked into the Google Hangout when this goes live around 7:15PM PDT.

See you there!