So we'll go over it here because it makes a good introduction to pg_shard and to JSONB.
First step was to spin up an AWS m3.xlarge instance to be our master. While I don't need much in other capabilities, I do want the master node to have AWS's best network option since it'll be doing a LOT of networking. Do, however, make sure to configure it with instance storage because you'll want that space for temp tables and any tables which aren't sharded. This also becomes our template for the shards, because it's easier that way. I created the instance running Ubuntu 14.04. Then I had to install all of the PostgreSQL 9.4 components and a few others, including:
- postgresql-9.4, postgresql-contrib-9.4, postgresql-server-dev-9.4
- python-psycopg2 (for my benchmark test)
- git, make
After that I "git cloned" the pg_shard repo and did a make and make install in it. Currently it emits some spurious warnings with 9.4; ignore those.
Next was configuring PostgreSQL for pg_shard. This is the master node, so it's where we're going to use pg_shard. In addition to the usual PostgreSQL configuration, you want to change two settings:
shared_preload_libraries = 'pg_shard' #load pg_shard
temp_buffers = 64MB #pg_shard uses temp tables internally
Then start (or restart) PostgreSQL. After that, we create a "psocial" database to represent our dummy social networking application, and set up a .pgpass file and an authorized_hosts file for passwordless access by the postgres user. Then we stop, and shut PostgreSQL down. and use the AWS API to create eight r3.large instances based on that master instance to be our shards. This way they already have the right configuration and the psocial database created.
We then harvest the IP addresses for those shards and turn them into some local DNS, naming them shard#.demo. After that, we can set up the pg_worker_list.conf file, which goes in the data directory for PostgreSQL (not the configuration directory, but the actual data directory).
Unfortunately, we do have to configure the shards a bit manually. First, we'll need to mount the instance storage on each shard, and move the PGDATA directory to it. Second, we'll need to start Postgres on each shard. If you use Puppet, Salt, Chef, CFEngine, or similar, then it will be easy do to this; otherwise, you're in for a little SSH time.
Now we are ready to create the sharded "members" table for our app. Let's create the pg_shard extension (this only needs to be done on the master):
CREATE EXTENSION pg_shard;
Then we can create the master table:
create table members (
id text not null unique,
create index on members(email);
create index on members using gin(profile);
You can see that we've created this table as a highly denormalized collection of JSON documents, using indexed JSONB. We're doing this mainly to test the capabilities of 9.4's JSONB in a similation of a large, sharded, NoSQL application. If this were a real application, I'd normalize it a bit. The id column is TEXT because we're going to put a fake GUID into it.
This empty table is our template for the shards, and like a master partition acts as a target for queries which will be redirected by pg_shard to the shards. It's not sharded yet; we have to do two things to make that happen. One is "register" it as a master, which creates entries in pg_shard's metadata for it:
This tells pg_shard that "members" is a sharded table, and that it will be sharded on a hash of the column "id". Now to create the actual shards:
The first number is the number of shards to create. While not required, it helps for this to be evenly divisible by the number of hosts you have. You generally want to create more shards than hosts, first to take advantage of some parallelism on each host, and second to have room for growth by "splitting" hosts. That is, in our psocial example, we could double to 16 hosts and give each half of the shards.
If you look at the pg_shard metadata, you can see that things are now sharded. The "partition" table holds our list of master tables:
psocial=# select * from pgs_distribution_metadata.partition ;
relation_id | partition_method | key
16450 | h | id
The "shard" table shows us the list of shards:
id | relation_id | storage | min_value | max_value
10000 | 16450 | t | -2147483648 | -2013265922
10001 | 16450 | t | -2013265921 | -1879048195
10002 | 16450 | t | -1879048194 | -1744830468
10003 | 16450 | t | -1744830467 | -1610612741
You can see that each shard is defined as a range of hash values, hashed using PostgreSQL's internal hashing function. Finally, the actual locations of shards are listed in the shard_placement table:
id | shard_id | shard_state | node_name | node_port
1 | 10000 | 1 | shard1.demo | 5432
2 | 10000 | 1 | shard2.demo | 5432
3 | 10001 | 1 | shard2.demo | 5432
4 | 10001 | 1 | shard3.demo | 5432
You'll notice that each shard exists on two hosts. That's that "redundancy" number we gave pg_shard when we created the shards; it says how many hosts should have a copy of each shard. As you can see, these shards are assigned on a round-robin basis.
If we actually look on one of the shards, we see that each shard is a numbered table:
List of relations
Schema | Name | Type | Owner
public | members_10000 | table | postgres
public | members_10007 | table | postgres
public | members_10008 | table | postgres
public | members_10015 | table | postgres
public | members_10016 | table | postgres
public | members_10023 | table | postgres
public | members_10024 | table | postgres
public | members_10031 | table | postgres
Again, you can see that there are eight shards per host, because of the 2X redundancy we asked for.
Next post, we'll populate the shards with a bunch of dummy data.
The follow-up to this blog post will be delayed by technical and work-conflict issues.ReplyDelete
This blog post has been very useful in getting started with pg_shard. Will the next part be available anytime soon? Thank you
Sorry, I've been backlogged. I went down the rabbit hole of engineering a JSON benchmark; I should have an update to this soon.Delete
i've been using pg 9.5 json recently. tis my first in depth exposure. all i can say is wow ... almost perfect. i'm still doing simple queries, basically the extract stage from json logging events into tables, so i haven't tested the optimizer much. however, the merge queries against +1million json tuples are quite performant. my only grumble would be null handling and lack of range queries on arrays.ReplyDelete