Photo courtesy K Lars Lohn. Used with permission.
Yesterday my colleague Robert Hodges made a post disputing the CAP theorem. While I have some issues with his logic -- and after all, it was a beery post put up on April 1 -- his post does bring up an issue I've meant to blog about for a while, so here goes.
CAP is an exclusionary triad: it consists of three elements, any pair of which excludes the other. Another exclusionary triad all my readers will be familiar with is PQT: Price, Quality, Time. Thing is, CAP is only one of several possible triads in distributed system design, and is not the most interesting triad for clustered database designers.
Most of the time you're ignoring Partition Tolerance for a clustered database design since it's limited to a single data center and partial network failure is not considered that great of a risk. This isn't universally true; for example, CAP rears its ugly headgear whenever you contemplate having automated failover for a failover/pooling proxy. However, there are more substantial tradeoffs to be made even in a single-network system.
There are several other possible combinations which fit the exclusionary triad model and relate more closely to database design. For example, CRT (Complexity, Response Time, Throughput) is a triad for read requests in a distributed database. I may write more about CRT later.
Today, though, I'm going to write about one of the triads which govern writes to a clustered database system. CAR: Consistency, Availability, or Response Time. You can choose any two of the three but not all three. To explain:
Consistency refers to having writes succeed or fail, and produce the same results, regardless of to which node you connect. Ideal Consistency would be the consistency of a single node.
Availability is the clustered database's ability to survive the failure of one or more nodes while still servicing client requests. Ideal Availability would support any number of failed nodes as long as one node was still running.
Response Time is the speed of service of a single write request. Ideal Response Time is the response time of a single node.
While real systems make some compromises on all elements of the triad, we can show this as a valid triad by defining the three extreme cases:
CR: a traditional, single-node transactional database is Consistent and has good Response Times. Obviously, it cannot survive node failure.
AR: a non-transactional multi-node key-value database with fully replicated data on each node, which accepts writes to any node and asynchronously attempts to copy the writes to other nodes would have perfect Availability and single-node Response Times, while completely abandoning consistency.
CA: on the other hand, consider a multi-node database in which each node has duplicate copies of the data and all transactions are written to all nodes using 2-phase commit (2PC). This database would have perfect Consistency and maximum Availability, at the cost of Response Times which get progressively slower as the cluster grows.
If you look at today's open source clustered databases, you'll find that they have all chosen one side or another of the above triangles, with some compromising in one direction or another. For example:
CR: mainstream PostgreSQL and MySQL.
AR: MongoDB, CouchDB, Bucardo.
CA: Continuent, VoltDB, Synchronous Replication, PostgresXC.
Obviously there are more design differences between the databases above than where they fall on the CAR triangle. But it's a good way to look at the tradeoffs you need to make. Probably more than CAP.