Monday, March 16, 2015

Benchmarking Postgres in the Cloud, part 1

In 2008, when Heroku started, there was only one real option for cloud hosting PostgreSQL: roll-your-own on EC2, or a couple other not-very-competitive platforms.  Since then, we've seen the number of cloud hosting providers explode, and added several "PostgreSQL-As-A-Service" providers as well: first Heroku, then Gandi, CloudFoundry, RDS, OpenShift and more.  This has led many of pgExperts' clients to ask: "Where should I be hosting my PostgreSQL?"

So to provide a definitive answer to that question, for the past several weeks I've been doing some head-to-head testing of different cloud hosting options for PostgreSQL.  Even more work has been done by my collaborator, Ruben Rudio Rey of ManageACloud.com.  I will be presenting on the results of this testing in a series of blog posts, together with a series of presentations starting at SCALE and going through pgConf NYC, LinuxFestNorthWest, and culminating at pgCon.   Each presentation will add new tests and new data.

Here's my slides from SCALE, which compare AWS, RDS, and Heroku, if you want to get some immediate data.

What We're Testing

The idea is to run benchmarks against ephemeral instances of PostgreSQL 9.3 on each cloud or service.  Our main goal is to collect performance figures, since while features and pricing are publicly available, performance information is not.  And even when the specification is the same, the actual throughput is not.  From each cloud or service, we are testing two different instance sizes:

Small: 1-2 cores, 3 to 4GB RAM, low throughput storage (compare EC2's m3.medium).  This is the "economy" instance for running PostgreSQL; it's intended to represent what people with non-critical PostgreSQL instances buy, and to answer the question of "how much performance can I get for cheap".

Large: 8-16 cores, 48 to 70GB RAM, high throughput storage (compare EC2's r3.2xlarge).  This is the maximum for a "high end" instance which we could afford to test in our test runs. 

The clouds we're testing or plan to test include:
  • AWS EC2 "roll-your-own".
  • Amazon RDS PostgreSQL
  • Heroku
  • Google Compute Engine
  • DigitalOcean
  • Rackspace Cloud
  • OpenShift PostgreSQL Cartridge
  • (maybe Joyent, not sure)
Note that in many cases we're working with the cloud vendor to achieve maximum performance results.  Our goal here isn't to "blind test" the various clouds, but rather to try to realistically deliver the best performance we can get on that platform.  In at least one case, our findings have resulted in the vendor making improvements to their cloud platform, which then allowed us to retest with better results.

The tests we're running include three pgbench runs:

  • In-Memory, Read-Write (IMRW): pgbench database 30% to 60% of the size of RAM, full transaction workload
  • In-Memory, Read-Only (IMRO): pgbench database 30% to 60% of RAM, read-only queries
  • On-Disk, Read-Write (ODRW): pgbench database 150% to 250% of RAM, full transactions
The idea here is to see the different behavior profiles with WAL-bound, CPU-bound, and storage-bound workloads.  We're also recording the load time for each database, since bulk loading behavior is useful information for each platform. 

Each combination of cloud/size/test needs to then be run at least 5 times in order to get a statistically useful sample.  As I will document later, often the difference between runs on the same cloud was greater than the difference between clouds.

Issues with pgBench as a Test Tool

One of the early things I discovered was some of the limitations of what pgbench could tell us.  Its workload is 100% random access and homogeneous one-liner queries.  It's also used extensively and automatically to test PostgreSQL performance.  As a result, we found that postgresql.conf tuning made little or no difference at all, so our original plan to test "tuned" vs. "untuned" instances went by the wayside.

We also found on public clouds that, because of the rapidfire nature of pgbench queries, performance was dominated by network response times more than anything on most workloads.  We did not use pgbench_tools, because that is concerned with automating many test runs against one host rather than a few test runs against many hosts.

For this reason, we also want to run a different, more "serious" benchmark which works out other performance areas.  To support this, I'm working on deploying Jignesh's build of DVDStore so that I can do that benchmark against the various platforms.  This will require some significant work to make a reality, though; I will need to create images or deployment tools on all of the platforms I want to test before I can do it.

To be continued ...

6 comments:

  1. Looking forward to it! I'd be curious to see how Linode fares, since they advertise their CPU power and SSDs.

    ReplyDelete
    Replies
    1. Oh, I hadn't thought about testing Linode. Not sure I will; they don't let me sign up for instances by the hour.

      Delete
    2. Really? I don't have an account with them currently, but they advertise hourly billing. Not sure what the details are, though.

      One thing that intrigued me about them was their use of SSDs as a write-through cache for their (more durable?) spinning disks. It's the kind of setup that sounds good in theory, but I'm a bit skeptical of how well it'd hold up under a write-heavy load.

      Delete
  2. Did the results for this ever got published?

    ReplyDelete
    Replies
    1. I second this comment. I saw the slide deck for your presentation on AWS options, and that was fantastic. I'd love to see how those other cloud options turned out. I scoured your blog and found no other mention of this.

      Thanks!

      Delete