Friday, December 26, 2014

PostgreSQL New Zealand Visit

I will be in New Zealand next month for LinuxConf AU.  In addition to a PostgreSQL replication tutorial at LCA, I will have other events to get in touch with the PostgreSQL community in Auckland and Wellington:

Auckland (Jan. 11 to 16)

Monday night (Jan. 12), at 7:30 PM, we will have a PostgreSQL 9.4 BOF at the LCA venue.  Members of the Auckland PostgreSQL community who can't otherwise make it to LCA are allowed and encouraged to attend this BOF; please contact me so I can get a headcount for pizza.

Wellington (Jan. 18 to 21) 

Tuesday, January 20th from 9am to 3pm we will have an extended PostgreSQL replication tutorial, hosted by NIWA New Zealand in downtown Wellington.  This will be the same tutorial I give at LCA, only the extended 4.5 hour version which covers things like replication slots and performance tuning replication.  Contact NIWA if you are interested in attending this; there is a fee.

On Tuesday or Monday night I would really like to have a Wellington PostgreSQL user meetup.  However, I need help finding a venue and getting in contact with PostgreSQL users in Wellington.  Please contact me if you can help, and check back here for updates on the meeting.

UPDATED: please see this wiki page for more information.

Thursday, December 18, 2014

Your Hanukkah Present: PostgreSQL 9.4

For the third night of Hanukkah, we have a nice present for you: PostgreSQL 9.4.0.  Now eat your latkes.

Let's talk about 9.4 by talking about one of my favorite features: JSONB.  A lot of people worked to bring you JSONB in this release, including my colleague Andrew Dunstan, Oleg Bartunov and Teodor Sigaev (sponsored by Engine Yard), Peter Geohegan of, and several others.

For example, imagine we have a large set of JSON documents which contain publication data about various books.  One reason we might have this is that we receive the data in JSON or HTML format from various publishers, and it is not normalized, so we keep it in its original format for data mining..  We can create store them in a JSON column like so:

    table booksdata (
        title citext not null,
        isbn isbn not null primary key,
        pubinfo jsonb not null

Some example pubinfo might be:

    {"authority" : [ "James Michaels", "Nina Totenberg" ], "cost": 16, "edition" : 3,
     "format": "Trade PB", "publisher": "HarperOne",
     "published_on": "1995-08-05", "revised" : [ "1996-06-01", "1999-01-01" ] }

You'll note that the keys are already sorted in alphabetical order.  This is a feature of JSONB; whitespace gets cleaned up and keys get storted on import.

Then we can create a general index on the JSONB like so:

    CREATE INDEX ON booksdata USING GIN (pubinfo);


    CREATE INDEX ON booksdata USING GIN (pubinfo json_path_ops);

There are two different versions depending on the operations you expect to run.  The standard GIN index supports every kind of operation we support for JSONB.  The path_ops index supports only the search path operator "@>" (see below), but produces a smaller and faster index for these kinds of searches.  However, we're going to do a bunch of different JSON operations for data mining, so we'll use the regular GIN index.

Once we have the index, we can do arbitrary path searches on JSONB, and these will be index searches, depending on what the planner picks.  Examples of JSONB queries:

Count all books published by "It Books":

    SELECT count(*) FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }';


Get the ISBN, and extract the cost of the book as a string, for all books.

    SELECT isbn, pubinfo #>> '{"cost"}' as cost
    FROM booksdata;
        isbn      │ cost
    0-06-203728-5 │ 16
    0-06-204980-1 │ 27
    0-06-206888-1 │ 180
    0-06-201744-6 │ 10
    0-06-162698-8 │ 20
    0-06-199895-8 │ 10

Give me the count of all books which have both the "publisher" and "cost" keys:

    SELECT count(*) FROM booksdata
    WHERE pubinfo ?& array['publisher','cost'];


That goofy "?&" is a special operator which says "has all of these keys".  There are other operators for "has any of these keys", and you can negate it with a NOT.

JSON operations can be combined with standard PostgreSQL aggregates and other query operations.  This allows us to provide far more functionality than non-relational databases have.  For example:

Give me the average cost of all books from "It Books":

    SELECT avg((pubinfo #>> '{"cost"}')::NUMERIC)
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }';


Now, you'll notice that I had to cast "cost" to numeric, even though it's stored as a number in the JSONB.  That's a current limitation.  By 9.5, expect to have more operators which use native JSON types.

Return the publisher name as a string, and the average cost for all books grouped by each publisher name:

    SELECT pubinfo #>> '{"publisher"}' as publisher,
        round(avg((pubinfo #>> '{"cost"}')::NUMERIC),2)
    FROM booksdata
    GROUP BY 1
    ORDER BY publisher;

            publisher         │ round
    Avon                      │  43.39
    Collins Reference         │  24.57
    Harper                    │  45.40
    HarperBusiness            │  26.29
    HarperOne                 │  21.96
    Harper Paperbacks         │  16.00
    Harper Voyager            │  29.18
    Igniter                   │  19.50
    It Books                  │  35.00
    William Morrow            │ 348.00
    William Morrow Paperbacks │  15.83

Wow, William Morrow is expensive!   I suspect some kind of data problem here.

Further, JSONB is sortable.  It sorts in a sensible way, first by keys and then by values, as their primitive JSON types (integer, numeric, boolean and text).  This allows JSONB values to be compared and even paginated.  For example, this query pulls the full pubinfo from all "It Books" titles, sorts them, and limits to 25:

    SELECT pubinfo
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }'
    ORDER BY pubinfo LIMIT 25 OFFSET 0;

    {"cost": 14, "format": "Trade PB", "publisher": "It Books", ...
    {"cost": 15, "format": "Hardcover", "publisher": "It Books", ...
    {"cost": 15, "format": "Trade PB", "publisher": "It Books", ...
    {"cost": 15, "format": "Trade PB", "publisher": "It Books", ...


This query grabs all pubinfo for "It Books" titles and then sorts them by the JSON of the publication date.  Since JSON has no internal type for dates, you'll want to be careful to use dates in Unix format:

    SELECT pubinfo
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }'
    ORDER BY pubinfo #> '{"published_on"}'

    {"cost": 260, "format": "Hardcover ", "publisher": "It Books",
        "published_on": "2006-03-01"}
    {"cost": 17, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2006-03-01"}
    {"cost": 90, "format": "Hardcover ", "publisher": "It Books",
        "published_on": "2006-11-01"}
    {"cost": 15, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2006-12-01"}
    {"cost": 22, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2007-02-01"}

So, some awesome functionality to let PostgreSQL 9.4 be your document database solution.  Download and enjoy!

Wednesday, December 10, 2014

SQLnoSQL: pg_shard and JSONB, part 1

Of course, the first thing which occurred to me with pg_shard was to create a distributed "NoSQL" database using 9.4's JSONB.  I had to wait a bit because there were some issues with 9.4 compatibility which needed resolving, but this week I was able to build it.  I was hoping to show it off at the SFPUG meeting, but sadly the weather gods (in the form of the PostgreSQL-powered NWS) interfered.

So we'll go over it here because it makes a good introduction to pg_shard and to JSONB.

First step was to spin up an AWS m3.xlarge instance to be our master.  While I don't need much in other capabilities, I do want the master node to have AWS's best network option since it'll be doing a LOT of networking.  Do, however, make sure to configure it with instance storage because you'll want that space for temp tables and any tables which aren't sharded.  This also becomes our template for the shards, because it's easier that way.  I created the instance running Ubuntu 14.04.  Then I had to install all of the PostgreSQL 9.4 components and a few others, including:

  • postgresql-9.4, postgresql-contrib-9.4, postgresql-server-dev-9.4
  • python-psycopg2 (for my benchmark test)
  • git, make

After that I "git cloned" the pg_shard repo and did a make and make install in it.  Currently it emits some spurious warnings with 9.4; ignore those.

Next was configuring PostgreSQL for pg_shard.  This is the master node, so it's where we're going to use pg_shard.  In addition to the usual PostgreSQL configuration, you want to change two settings:

shared_preload_libraries = 'pg_shard' #load pg_shard
temp_buffers = 64MB #pg_shard uses temp tables internally

Then start (or restart) PostgreSQL.  After that, we create a "psocial" database to represent our dummy social networking application, and set up a .pgpass file and an authorized_hosts file for passwordless access by the postgres user.  Then we stop, and shut PostgreSQL down. and use the AWS API to create eight r3.large instances based on that master instance to be our shards.  This way they already have the right configuration and the psocial database created.

We then harvest the IP addresses for those shards and turn them into some local DNS, naming them shard#.demo.  After that, we can set up the pg_worker_list.conf file, which goes in the data directory for PostgreSQL (not the configuration directory, but the actual data directory).

    #hostname       port
    shard1.demo     5432
    shard2.demo     5432
    shard3.demo     5432
    shard4.demo     5432
    shard5.demo     5432
    shard6.demo     5432
    shard7.demo     5432
    shard8.demo     5432

Unfortunately, we do have to configure the shards a bit manually.  First, we'll need to mount the instance storage on each shard, and move the PGDATA directory to it.  Second, we'll need to start Postgres on each shard.  If you use Puppet, Salt, Chef, CFEngine, or similar, then it will be easy do to this; otherwise, you're in for a little SSH time.

Now we are ready to create the sharded "members" table for our app.  Let's create the pg_shard extension (this only needs to be done on the master):

    CREATE EXTENSION pg_shard;

Then we can create the master table:

    create table members (
        id text not null unique,
        email text,
        profile jsonb
    create index on members(email);
    create index on members using gin(profile);

You can see that we've created this table as a highly denormalized collection of JSON documents, using indexed JSONB.  We're doing this mainly to test the capabilities of 9.4's JSONB in a similation of a large, sharded, NoSQL application.  If this were a real application, I'd normalize it a bit.  The id column is TEXT because we're going to put a fake GUID into it.

This empty table is our template for the shards, and like a master partition acts as a target for queries which will be redirected by pg_shard to the shards.  It's not sharded yet; we have to do two things to make that happen.  One is "register" it as a master, which creates entries in pg_shard's metadata for it:

    SELECT master_create_distributed_table('members','id');

This tells pg_shard that "members" is a sharded table, and that it will be sharded on a hash of the column "id".  Now to create the actual shards:

    SELECT master_create_worker_shards('members',32,2);

The first number is the number of shards to create.  While not required, it helps for this to be evenly divisible by the number of hosts you have.  You generally want to create more shards than hosts, first to take advantage of some parallelism on each host, and second to have room for growth by "splitting" hosts.  That is, in our psocial example, we could double to 16 hosts and give each half of the shards.

If you look at the pg_shard metadata, you can see that things are now sharded.  The "partition" table holds our list of master tables:

    psocial=# select * from pgs_distribution_metadata.partition ;
     relation_id | partition_method | key
           16450 | h                | id

The "shard" table shows us the list of shards:

     id   | relation_id | storage |  min_value  |  max_value
    10000 |       16450 | t       | -2147483648 | -2013265922
    10001 |       16450 | t       | -2013265921 | -1879048195
    10002 |       16450 | t       | -1879048194 | -1744830468
    10003 |       16450 | t       | -1744830467 | -1610612741

You can see that each shard is defined as a range of hash values, hashed using PostgreSQL's internal hashing function.  Finally, the actual locations of shards are listed in the shard_placement table:

   id | shard_id | shard_state |  node_name  | node_port
    1 |    10000 |           1 | shard1.demo |      5432
    2 |    10000 |           1 | shard2.demo |      5432
    3 |    10001 |           1 | shard2.demo |      5432
    4 |    10001 |           1 | shard3.demo |      5432

You'll notice that each shard exists on two hosts.  That's that "redundancy" number we gave pg_shard when we created the shards; it says how many hosts should have a copy of each shard.  As you can see, these shards are assigned on a round-robin basis.

If we actually look on one of the shards, we see that each shard is a numbered table:

                List of relations
    Schema |     Name      | Type  |  Owner
    public | members_10000 | table | postgres
    public | members_10007 | table | postgres
    public | members_10008 | table | postgres
    public | members_10015 | table | postgres
    public | members_10016 | table | postgres
    public | members_10023 | table | postgres
    public | members_10024 | table | postgres
    public | members_10031 | table | postgres

Again, you can see that there are eight shards per host, because of the 2X redundancy we asked for.

Next post, we'll populate the shards with a bunch of dummy data.

Monday, December 8, 2014

Loading pg_partman on RDS or Heroku

One of the limitations with both RDS and Heroku is that you can only install the extensions they've approved and made available to you.  Except ... you can install "SQL-only" extensions which don't have a binary component.  Like pg_partman.

pg_partman is a set of functions and tables which help you manage your partitioned tables, including creating new partitions and triggers on the partition master.  I love it because it means I never have to write ad-hoc code for a client to do that again.  So of course I want to install it on our clients who are running on Heroku and RDS and need partitioning.  However, I have no access to "create extension partman" on the RDS command line, so how do I make it work?  Mostly by loading the extension the old way.

1) create a new schema on the target database called "partman".

2) Install PostgreSQL 9.3 on my laptop, including the development files.

3) git clone pg_partman.

4) make & make install pg_partman on my laptop.  Inside the pg_partman source directory, that will create a file in the "sql" directory called (currently): pg_partman--1.7.2.sql.

5) Since that file is designed to be installed as an extension, I need to alter it.  First, I replace all occurances of "@extschema@" with "partman".  Second, I delete the line "SELECT pg_catalog.pg_extension_config_dump('part_config', '');"

6) Load partman into your database by using "psql -f  pg_partman--1.7.2.sql YOUR_DB_NAME -h YOUR_HOST".  In the RDS case, this needs to be done as the instance superuser.

pg_partman can now be used in the target database by calling all of the functions as "partman.function", or by putting the partman schema in your search_path.

Now, this has the major drawback that upgrading pg_partman once 1.7.3 comes out will be a very manual process.  But that's better than not using it at all.

Thursday, December 4, 2014

What's this pg_shard thing?

I've been working with CitusData for a while, and I'm excited that they've finally released pg_shard, a generic "sharding" extension for PostgreSQL databases.  The press releases have all of the noise about this, so this blog post is for the nuts-and-bolts of pg_shard.


What does pg_shard do?

pg_shard takes one or more tables in your database and distributes them over multiple databases on multiple servers, or "shards".  Rows are distributed based on a hash function on the primary key which you define.  The hash function used is Postgres' own hash function for hash indexes.

A single node is the "master node" which contains templates for the distributed tables, as and transparently handles making sure that inserts, updates, and selects go to the right shards.  To ensure redundancy, sharded tables can be defined as having multiple copies across multiple shards.

SELECTs are distributed based on comparing constants in your WHERE clause and other filter conditions against the hashed key, allowing you to query against only the shards which have compliant data.  This should speed up queries for really large tables a great deal.

Where there are multiple copies of shards, pg_shard notices timeouts in contacting the shards and marks certain shards as offline. 

All of this is done inside PostgreSQL, most of it using our defined APIs and hooks.  The only thing the Citus team had to fork was ruleutils.c, and they're hoping to change that in 9.5 so forking that isn't required either.


What does pg_shard not do?

First, since the rows are distributed using a hash function, only equality comparisons are currently supported.  This means that if you query your table on ranges (such as BETWEEN two timestamps) it will scan all shards for that data.  Adding range partitioning to pg_shard is planned.

Second, unlike Postgres-XC/Postgres-XL, transactions between shards and distributed JOINs are not supported.  Those features are among the things which distinguish CitusData's proprietary product.

Third, currently only one master node is supported.  That means that pg_shard isn't yet a solution for supporting applications which need massive numbers of connections.   Supporting multiple query nodes is planned for the future.

Complex distribution rules, like requiring each shard to be copied to a specific subset of hosts in a different availability zone or data center, is also not supported.  It's not clear when/if it will be.


So what is pg_shard useful for right now?

A lot of web applications consist of "One Big Table" (OBT) which by size is 90% or more of the database. This might be a users table, or a messages table, a log table, or a pictures table.  Years ago, this was exactly the problem we faced with Instagram.

pg_shard will allow users to distribute the OBT across many nodes in a public cloud, letting users scale to much larger sizes.  Particularly, it makes it much easier to redistribute the OBT so that it fits in RAM on each cloud node, both decreasing overall cloud hosting cost and improving throughput and response times.  And since there's shard redundancy, they are protected against certain kinds of cloud failures.

As Instagram and Wanelo have done, you can implement this kind of sharding on the application layer ... and when you get to their scale, you'll probably have to.  But most PostgreSQL users aren't at that scale yet, and pg_shard can help them get over the "hump" of going from one server to several with a lot less pain.


Can you give some examples of pg_sharding a database?

First, over the next week I'll be building a "Postgre-NoSQL" database using PostgreSQL 9.4, pg_shard and JSONB, to demonstrate that you can replace some non-relational databases with PostgreSQL.  Expect more on this blog.

If you're in the Bay Area, then join us for an SFPUG meeting next week, which will be all about pg_shard.  If possible, we'll have live video, but no promises.