Thursday, December 18, 2014

Your Hanukkah Present: PostgreSQL 9.4

For the third night of Hanukkah, we have a nice present for you: PostgreSQL 9.4.0.  Now eat your latkes.



Let's talk about 9.4 by talking about one of my favorite features: JSONB.  A lot of people worked to bring you JSONB in this release, including my colleague Andrew Dunstan, Oleg Bartunov and Teodor Sigaev (sponsored by Engine Yard), Peter Geohegan of Heroku.com, and several others.

For example, imagine we have a large set of JSON documents which contain publication data about various books.  One reason we might have this is that we receive the data in JSON or HTML format from various publishers, and it is not normalized, so we keep it in its original format for data mining..  We can create store them in a JSON column like so:

    table booksdata (
        title citext not null,
        isbn isbn not null primary key,
        pubinfo jsonb not null
    )

Some example pubinfo might be:

    {"authority" : [ "James Michaels", "Nina Totenberg" ], "cost": 16, "edition" : 3,
     "format": "Trade PB", "publisher": "HarperOne",
     "published_on": "1995-08-05", "revised" : [ "1996-06-01", "1999-01-01" ] }

You'll note that the keys are already sorted in alphabetical order.  This is a feature of JSONB; whitespace gets cleaned up and keys get storted on import.

Then we can create a general index on the JSONB like so:

    CREATE INDEX ON booksdata USING GIN (pubinfo);

or:

    CREATE INDEX ON booksdata USING GIN (pubinfo json_path_ops);

There are two different versions depending on the operations you expect to run.  The standard GIN index supports every kind of operation we support for JSONB.  The path_ops index supports only the search path operator "@>" (see below), but produces a smaller and faster index for these kinds of searches.  However, we're going to do a bunch of different JSON operations for data mining, so we'll use the regular GIN index.

Once we have the index, we can do arbitrary path searches on JSONB, and these will be index searches, depending on what the planner picks.  Examples of JSONB queries:

Count all books published by "It Books":

    SELECT count(*) FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }';

     count
    ───────
     41

Get the ISBN, and extract the cost of the book as a string, for all books.

    SELECT isbn, pubinfo #>> '{"cost"}' as cost
    FROM booksdata;
    
        isbn      │ cost
    ──────────────┼──────
    0-06-203728-5 │ 16
    0-06-204980-1 │ 27
    0-06-206888-1 │ 180
    0-06-201744-6 │ 10
    0-06-162698-8 │ 20
    0-06-199895-8 │ 10
 

Give me the count of all books which have both the "publisher" and "cost" keys:

    SELECT count(*) FROM booksdata
    WHERE pubinfo ?& array['publisher','cost'];

     count
    ───────
     391

That goofy "?&" is a special operator which says "has all of these keys".  There are other operators for "has any of these keys", and you can negate it with a NOT.

JSON operations can be combined with standard PostgreSQL aggregates and other query operations.  This allows us to provide far more functionality than non-relational databases have.  For example:

Give me the average cost of all books from "It Books":

    SELECT avg((pubinfo #>> '{"cost"}')::NUMERIC)
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }';

              avg        
      ─────────────────────
       35.0000000000000000


Now, you'll notice that I had to cast "cost" to numeric, even though it's stored as a number in the JSONB.  That's a current limitation.  By 9.5, expect to have more operators which use native JSON types.

Return the publisher name as a string, and the average cost for all books grouped by each publisher name:

    SELECT pubinfo #>> '{"publisher"}' as publisher,
        round(avg((pubinfo #>> '{"cost"}')::NUMERIC),2)
    FROM booksdata
    GROUP BY 1
    ORDER BY publisher;

            publisher         │ round
    ──────────────────────────┼────────
    Avon                      │  43.39
    Collins Reference         │  24.57
    Harper                    │  45.40
    HarperBusiness            │  26.29
    HarperOne                 │  21.96
    Harper Paperbacks         │  16.00
    Harper Voyager            │  29.18
    Igniter                   │  19.50
    It Books                  │  35.00
    William Morrow            │ 348.00
    William Morrow Paperbacks │  15.83


Wow, William Morrow is expensive!   I suspect some kind of data problem here.

Further, JSONB is sortable.  It sorts in a sensible way, first by keys and then by values, as their primitive JSON types (integer, numeric, boolean and text).  This allows JSONB values to be compared and even paginated.  For example, this query pulls the full pubinfo from all "It Books" titles, sorts them, and limits to 25:

    SELECT pubinfo
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }'
    ORDER BY pubinfo LIMIT 25 OFFSET 0;

                        pubinfo
    ─────────────────────────────────────────────────
    {"cost": 14, "format": "Trade PB", "publisher": "It Books", ...
    {"cost": 15, "format": "Hardcover", "publisher": "It Books", ...
    {"cost": 15, "format": "Trade PB", "publisher": "It Books", ...
    {"cost": 15, "format": "Trade PB", "publisher": "It Books", ...

    ...

This query grabs all pubinfo for "It Books" titles and then sorts them by the JSON of the publication date.  Since JSON has no internal type for dates, you'll want to be careful to use dates in Unix format:

    SELECT pubinfo
    FROM booksdata
    WHERE pubinfo @> '{ "publisher" : "It Books" }'
    ORDER BY pubinfo #> '{"published_on"}'
    LIMIT 5 OFFSET 0

                            pubinfo
    ──────────────────────────────────────────────────────────────
    {"cost": 260, "format": "Hardcover ", "publisher": "It Books",
        "published_on": "2006-03-01"}
    {"cost": 17, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2006-03-01"}
    {"cost": 90, "format": "Hardcover ", "publisher": "It Books",
        "published_on": "2006-11-01"}
    {"cost": 15, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2006-12-01"}
    {"cost": 22, "format": "Trade PB", "publisher": "It Books",
        "published_on": "2007-02-01"}


So, some awesome functionality to let PostgreSQL 9.4 be your document database solution.  Download and enjoy!

Wednesday, December 10, 2014

SQLnoSQL: pg_shard and JSONB, part 1

Of course, the first thing which occurred to me with pg_shard was to create a distributed "NoSQL" database using 9.4's JSONB.  I had to wait a bit because there were some issues with 9.4 compatibility which needed resolving, but this week I was able to build it.  I was hoping to show it off at the SFPUG meeting, but sadly the weather gods (in the form of the PostgreSQL-powered NWS) interfered.

So we'll go over it here because it makes a good introduction to pg_shard and to JSONB.

First step was to spin up an AWS m3.xlarge instance to be our master.  While I don't need much in other capabilities, I do want the master node to have AWS's best network option since it'll be doing a LOT of networking.  Do, however, make sure to configure it with instance storage because you'll want that space for temp tables and any tables which aren't sharded.  This also becomes our template for the shards, because it's easier that way.  I created the instance running Ubuntu 14.04.  Then I had to install all of the PostgreSQL 9.4 components and a few others, including:

  • postgresql-9.4, postgresql-contrib-9.4, postgresql-server-dev-9.4
  • python-psycopg2 (for my benchmark test)
  • git, make

After that I "git cloned" the pg_shard repo and did a make and make install in it.  Currently it emits some spurious warnings with 9.4; ignore those.

Next was configuring PostgreSQL for pg_shard.  This is the master node, so it's where we're going to use pg_shard.  In addition to the usual PostgreSQL configuration, you want to change two settings:

shared_preload_libraries = 'pg_shard' #load pg_shard
temp_buffers = 64MB #pg_shard uses temp tables internally


Then start (or restart) PostgreSQL.  After that, we create a "psocial" database to represent our dummy social networking application, and set up a .pgpass file and an authorized_hosts file for passwordless access by the postgres user.  Then we stop, and shut PostgreSQL down. and use the AWS API to create eight r3.large instances based on that master instance to be our shards.  This way they already have the right configuration and the psocial database created.

We then harvest the IP addresses for those shards and turn them into some local DNS, naming them shard#.demo.  After that, we can set up the pg_worker_list.conf file, which goes in the data directory for PostgreSQL (not the configuration directory, but the actual data directory).

    #hostname       port
    shard1.demo     5432
    shard2.demo     5432
    shard3.demo     5432
    shard4.demo     5432
    shard5.demo     5432
    shard6.demo     5432
    shard7.demo     5432
    shard8.demo     5432


Unfortunately, we do have to configure the shards a bit manually.  First, we'll need to mount the instance storage on each shard, and move the PGDATA directory to it.  Second, we'll need to start Postgres on each shard.  If you use Puppet, Salt, Chef, CFEngine, or similar, then it will be easy do to this; otherwise, you're in for a little SSH time.

Now we are ready to create the sharded "members" table for our app.  Let's create the pg_shard extension (this only needs to be done on the master):

    CREATE EXTENSION pg_shard;

Then we can create the master table:

    create table members (
        id text not null unique,
        email text,
        profile jsonb
    );
    create index on members(email);
    create index on members using gin(profile);


You can see that we've created this table as a highly denormalized collection of JSON documents, using indexed JSONB.  We're doing this mainly to test the capabilities of 9.4's JSONB in a similation of a large, sharded, NoSQL application.  If this were a real application, I'd normalize it a bit.  The id column is TEXT because we're going to put a fake GUID into it.

This empty table is our template for the shards, and like a master partition acts as a target for queries which will be redirected by pg_shard to the shards.  It's not sharded yet; we have to do two things to make that happen.  One is "register" it as a master, which creates entries in pg_shard's metadata for it:

    SELECT master_create_distributed_table('members','id');

This tells pg_shard that "members" is a sharded table, and that it will be sharded on a hash of the column "id".  Now to create the actual shards:

    SELECT master_create_worker_shards('members',32,2);

The first number is the number of shards to create.  While not required, it helps for this to be evenly divisible by the number of hosts you have.  You generally want to create more shards than hosts, first to take advantage of some parallelism on each host, and second to have room for growth by "splitting" hosts.  That is, in our psocial example, we could double to 16 hosts and give each half of the shards.

If you look at the pg_shard metadata, you can see that things are now sharded.  The "partition" table holds our list of master tables:

    psocial=# select * from pgs_distribution_metadata.partition ;
     relation_id | partition_method | key
    -------------+------------------+-----
           16450 | h                | id


The "shard" table shows us the list of shards:

     id   | relation_id | storage |  min_value  |  max_value
    ------+-------------+---------+-------------+-------------
    10000 |       16450 | t       | -2147483648 | -2013265922
    10001 |       16450 | t       | -2013265921 | -1879048195
    10002 |       16450 | t       | -1879048194 | -1744830468
    10003 |       16450 | t       | -1744830467 | -1610612741
    ...


You can see that each shard is defined as a range of hash values, hashed using PostgreSQL's internal hashing function.  Finally, the actual locations of shards are listed in the shard_placement table:

   id | shard_id | shard_state |  node_name  | node_port
    --+----------+-------------+-------------+-----------
    1 |    10000 |           1 | shard1.demo |      5432
    2 |    10000 |           1 | shard2.demo |      5432
    3 |    10001 |           1 | shard2.demo |      5432
    4 |    10001 |           1 | shard3.demo |      5432


You'll notice that each shard exists on two hosts.  That's that "redundancy" number we gave pg_shard when we created the shards; it says how many hosts should have a copy of each shard.  As you can see, these shards are assigned on a round-robin basis.

If we actually look on one of the shards, we see that each shard is a numbered table:

                List of relations
    Schema |     Name      | Type  |  Owner
    -------+---------------+-------+----------
    public | members_10000 | table | postgres
    public | members_10007 | table | postgres
    public | members_10008 | table | postgres
    public | members_10015 | table | postgres
    public | members_10016 | table | postgres
    public | members_10023 | table | postgres
    public | members_10024 | table | postgres
    public | members_10031 | table | postgres


Again, you can see that there are eight shards per host, because of the 2X redundancy we asked for.

Next post, we'll populate the shards with a bunch of dummy data.

Monday, December 8, 2014

Loading pg_partman on RDS or Heroku

One of the limitations with both RDS and Heroku is that you can only install the extensions they've approved and made available to you.  Except ... you can install "SQL-only" extensions which don't have a binary component.  Like pg_partman.

pg_partman is a set of functions and tables which help you manage your partitioned tables, including creating new partitions and triggers on the partition master.  I love it because it means I never have to write ad-hoc code for a client to do that again.  So of course I want to install it on our clients who are running on Heroku and RDS and need partitioning.  However, I have no access to "create extension partman" on the RDS command line, so how do I make it work?  Mostly by loading the extension the old way.

1) create a new schema on the target database called "partman".

2) Install PostgreSQL 9.3 on my laptop, including the development files.

3) git clone pg_partman.

4) make & make install pg_partman on my laptop.  Inside the pg_partman source directory, that will create a file in the "sql" directory called (currently): pg_partman--1.7.2.sql.

5) Since that file is designed to be installed as an extension, I need to alter it.  First, I replace all occurances of "@extschema@" with "partman".  Second, I delete the line "SELECT pg_catalog.pg_extension_config_dump('part_config', '');"

6) Load partman into your database by using "psql -f  pg_partman--1.7.2.sql YOUR_DB_NAME -h YOUR_HOST".  In the RDS case, this needs to be done as the instance superuser.

pg_partman can now be used in the target database by calling all of the functions as "partman.function", or by putting the partman schema in your search_path.

Now, this has the major drawback that upgrading pg_partman once 1.7.3 comes out will be a very manual process.  But that's better than not using it at all.



Thursday, December 4, 2014

What's this pg_shard thing?

I've been working with CitusData for a while, and I'm excited that they've finally released pg_shard, a generic "sharding" extension for PostgreSQL databases.  The press releases have all of the noise about this, so this blog post is for the nuts-and-bolts of pg_shard.

 

What does pg_shard do?


pg_shard takes one or more tables in your database and distributes them over multiple databases on multiple servers, or "shards".  Rows are distributed based on a hash function on the primary key which you define.  The hash function used is Postgres' own hash function for hash indexes.

A single node is the "master node" which contains templates for the distributed tables, as and transparently handles making sure that inserts, updates, and selects go to the right shards.  To ensure redundancy, sharded tables can be defined as having multiple copies across multiple shards.

SELECTs are distributed based on comparing constants in your WHERE clause and other filter conditions against the hashed key, allowing you to query against only the shards which have compliant data.  This should speed up queries for really large tables a great deal.

Where there are multiple copies of shards, pg_shard notices timeouts in contacting the shards and marks certain shards as offline. 

All of this is done inside PostgreSQL, most of it using our defined APIs and hooks.  The only thing the Citus team had to fork was ruleutils.c, and they're hoping to change that in 9.5 so forking that isn't required either.

 

What does pg_shard not do?


First, since the rows are distributed using a hash function, only equality comparisons are currently supported.  This means that if you query your table on ranges (such as BETWEEN two timestamps) it will scan all shards for that data.  Adding range partitioning to pg_shard is planned.

Second, unlike Postgres-XC/Postgres-XL, transactions between shards and distributed JOINs are not supported.  Those features are among the things which distinguish CitusData's proprietary product.

Third, currently only one master node is supported.  That means that pg_shard isn't yet a solution for supporting applications which need massive numbers of connections.   Supporting multiple query nodes is planned for the future.

Complex distribution rules, like requiring each shard to be copied to a specific subset of hosts in a different availability zone or data center, is also not supported.  It's not clear when/if it will be.

 

So what is pg_shard useful for right now?


A lot of web applications consist of "One Big Table" (OBT) which by size is 90% or more of the database. This might be a users table, or a messages table, a log table, or a pictures table.  Years ago, this was exactly the problem we faced with Instagram.

pg_shard will allow users to distribute the OBT across many nodes in a public cloud, letting users scale to much larger sizes.  Particularly, it makes it much easier to redistribute the OBT so that it fits in RAM on each cloud node, both decreasing overall cloud hosting cost and improving throughput and response times.  And since there's shard redundancy, they are protected against certain kinds of cloud failures.

As Instagram and Wanelo have done, you can implement this kind of sharding on the application layer ... and when you get to their scale, you'll probably have to.  But most PostgreSQL users aren't at that scale yet, and pg_shard can help them get over the "hump" of going from one server to several with a lot less pain.

 

Can you give some examples of pg_sharding a database?


First, over the next week I'll be building a "Postgre-NoSQL" database using PostgreSQL 9.4, pg_shard and JSONB, to demonstrate that you can replace some non-relational databases with PostgreSQL.  Expect more on this blog.

If you're in the Bay Area, then join us for an SFPUG meeting next week, which will be all about pg_shard.  If possible, we'll have live video, but no promises.







Friday, November 28, 2014

Setting up Ghost with PostgreSQL and Apache vhosts

I'm currently in love with Ghost, a Node.js blogging platform.  It's simple and small, uses Markdown for blog editing, and has scads of themes available, most of which are responsive and typography-centric.  If there's a project which might be a Wordpress-killer -- especially for PostgreSQL users -- Ghost is probably it.  I've set up both my food and pottery blog and the new SFPUG page using Ghost.

Ghost comes configured by default to use SQLite, though, and of course I had to run it on Postgres.  Also, I needed to set up Apache vhosts to redirect to the various Ghost instances, since one Node instance only runs one Ghost instance, and needed supervisord for autostart.  At some point later, I'll repackage all this to use docker containers for each Ghost instance, but these instructions don't cover that.

First, install Node.js.  OS packages are fine for Ubuntu 14.04 or Fedora; for other OSes you may need to get funky with source installs.  I'm not going to cover that here.  You'll also need to install the Node PostgreSQL driver, and the postgresql-server-dev package.

Second, "git clone" the stable branch of the Ghost code into an appropriate directory.  This can be directly your web directory for Ghost, or you can download, build, and then copy.  If you do the latter, be aware that Ghost has several "dot" files it needs which a regular "cp -r" won't pick up.

Now, before you build Ghost, you're going to need to make some changes to support PostgreSQL.  Edit "package.json" in the main directory, and add PostgreSQL into the "optional dependencies" section like so:

    "optionalDependencies": {
        "mysql": "2.1.1",
        "pg" : "latest"
    },


Now, build Ghost by running the build commands from the home directory:

    npm install -g grunt-cli
    npm install
    grunt init
    grunt prod


Make sure you scroll through the output of the above commands and look for errors; they don't always fail on error.

As a note, Ghost, like Rails, has "dev", "stage", and "production" modes.  Each of these modes can use a different database, even a different DBMS.  Or you can have Ghost connect to the same database for all three modes.  By default, Ghost starts in dev mode.

You'll need to configure your database connection.  Before editing config.js, make sure that you've set up PostgreSQL to accept local connections using an md5 password.  I additionally route Ghost through pgBouncer in case of getting slammed by connections; this requires you to configure pgBouncer as well:
  1. add Ghost user to the database
  2. create a database for the Ghost
  3. configure pg_hba.conf so the new user can connect with an md5 password
  4. if using pgBouncer, configure pgbouncer.ini and users.txt with the database and the user/password you're using for Ghost.
Once that's all set up, you can configure Ghost to use Postgres.  Edit config.js, and in each section (dev, staging and prod) where you want to use a Postgres database, replace the SQLite connection information with something like this:

        database: {
            client: 'pg',
            connection: {
                host     : '127.0.0.1',
                port     : '6432',
                user     : 'ghostblog',
                password : 'supercalifragalistic',
                database : 'ghostblog',
                charset  : 'utf8'
            }
        },


Then save.  As you can see, that's a configuration for pgBouncer.  If you're not using pgBouncer, then you can omit the host and port configuration.  If you're using Heroku, you'll need to point the Host at the public address for your Heroku Postgres instance. Restart Ghost at this point, or start it if you haven't already:

   npm start
   or:
   npm start --production

After this, you should be able to connect to Ghost on port 2368 of the defined interface (127.0.0.1 if you haven't configured anything).  However, that doesn't actually get a site up.  For one, Ghost may have to share the server with non-Ghost sites and other Ghost instances.  Second, users aren't generally accustomed to looking for sites on port 2638.  Also, I'm not all that confident about Node.js security, given how new the platform is.

Enter Apache vhosts with proxying.

First, configure Ghost to listen only an alternate port and the internal IP address:

        server: {
            // Host to be passed to node's `net.Server#listen()`
            host: '127.0.0.1',
            // Port to be passed to node's `net.Server#listen()`
            port: '4444'
        }


Note that if you have several Ghost instances, you'll need a different port for each.  If you're running Ghost under Docker, then redirect the Ghost port to a unique high port on the host.

Next, create a new vhost file (in /etc/apache2/servers-available on Ubuntu) pointing to Ghost:

    <VirtualHost *:80>
        ServerName sfpostgres.org
        ServerAlias www.sfpostgres.org

        ProxyRequests off
        ProxyPass / http://127.0.0.1:4444/
        ProxyPassReverse / http:/127.0.0.1:4444/

        CustomLog /var/log/apache2/www.sfpostgres.org-access.log combined
        ErrorLog /var/log/apache2/www.sfpostgres.org-error.log
    </VirtualHost>


Those ProxyPass lines allow bi-directional pass-through from Apache to Ghost.  Note that this prevents you from putting any other Apache directives on the Vhost other than logging, such as an authentication config; they will get ignored, because the request gets passed to Ghost first.

Now, link that to servers-enabled, and restart Ghost and Apache.  You should now be able to connect to Ghost using the URL of the vhost.

But wait ... what's keeping Ghost running?  npm will quit on error, and has no ability to restart with a server restart.   You could craft your own init file, but and alternative is to use supervisord.

First, install supervisor from packages.  Then add a config file to /etc/supervisor/conf.d:

    [program:sfpostgres]
    command = node /var/www/sfpostgres.org/index.js
    directory = /var/www/sfpostgres.org
    user = ghost
    autostart = true
    autorestart = true
    stdout_logfile = /var/log/supervisor/sfpostgres.log
    stderr_logfile = /var/log/supervisor/sfpostgres_err.log
    environment = NODE_ENV="production"


Then add the service to supervisor and enable it:

    supervisorctl add sfpostgres
    supervisorctl start sfpostgres

.... and that's it!  You should now have a Ghost instance on a custom domain with autorestart able to share a server with other websites.  And most importantly, running on PostgreSQL.

Wednesday, November 19, 2014

Good kernel, bad kernel

A month ago I got into an argument on IRC with Sergey about telling people to avoid kernel 3.2.  This turned out to be a very productive argument, because Sergey then went and did a battery of performance tests against various Linux kernels on Ubuntu. Go read it now, I'll wait.

My takeaways from this:

  • Kernel 3.2 is in fact lethally bad.
  • Kernel 3.13 is the best out of kernel 3.X so far.  I hope that this can be credited to the PostgreSQL team's work with the LFS/MM group.
  • No 3.X kernel yet has quite the throughput of 2.6.32, at least at moderate memory sizes and core counts.
  • However, kernel 3.13 has substantially lower write volumes at almost the same throughput.  This means that if you are write-bound on IO, 3.13 will improve your performance considerably.
  • If your database is mostly-reads and highly concurrent, consider enabling
    kernel.sched_autogroup_enabled.
Thanks a lot to Sergey for doing this testing, and thanks even more to the LFS/MM group for improving IO performance so much in 3.13.

Tuesday, November 18, 2014

pgDay SF 2015 Call For Speakers/Sponsors now open

pgDay SF 2015, to be held March 10th in Burlingame, CA, now has its Call for Speakers open.  The event is also looking for sponsors.  Come join us and meet even more Postgres folks!