Wednesday, July 11, 2018

Should I Run Postgres on Kubernetes? Part I

In preparation for my workshop on running PostgreSQL on Kubernetes on Monday, I wanted to talk a bit about why you'd want to run your database there in the first place -- and why you wouldn't.

The container world, starting with Docker and then moving to Kubernetes, has focused on stateless services like web applications.  This has been largely because stateless services are simply easier to manage in new environments, and can be handled generically was well, allowing Kubernetes to be relatively platform-agnostic.  Once you get into services that require storage and other persistent resources, the idea of "run your containerized application on any platform" becomes much more challenging.

Somewhere along the way "stateful services are hard" morphed into "you shouldn't run stateful services" as a kind of mantra.  Considering how much work contributors have put into making stateful apps possible, that's simply wrong.  And, for that matter, when has running databases on any stack ever been easy?

Let's start with some of the reasons you would want to run Postgres (or other databases) on Kubernetes.  In tommorrow's post, I'll go into some of the reasons why you would not want to.

One Environment to Rule Them All

The biggest reason is to simplify your development and deployment picture by putting all application components on Kubernetes.  It supplies a whole set of scaffolding to make deploying and integrating applications and databases easier, including shared secrets, universal discovery, load balancing, service monitoring, and scaling.  While there are integration points, like the Service Catalog, that support treating external databases as Kubernetes services, it's always going to be harder to manage a database that has to be deployed by a completely different process from the application it supports.

As a small example, let's take database connection credentials. If both the database and the application are on Kubernetes, rotating/updating credentials is simple: you just update a Secrets object (or a plugin provider like KeyCloak), and both the application and the database will pick up the new logins.  If the database is external, this becomes two separate update processes, each of which uses different tools.

With the integration from locating everything on Kubernetes, setting up other parts of your development infrastructure get much easier.  Deploying a test application to dev or staging can work the same way as it does in prod, minimizing mistakes.  CI/CD can more easily include database updates and upgrades.

Do the DevOps

With this level of integration, and the right coworkers, Kubernetes then enables a "real DevOps workflow" where each development team owns their own database.  In dev and staging environments, and maybe even in production, developers can be given the ability to self-service database support, deploying small replication clusters using predefined templates.

This isn't theoretical; it's what online fashion retailer Zalando is actually doing, allowing them to manage a large company with hundreds of online applications and a handful of database staff.

In traditional -- or even VM cloud -- environments, this is much, much harder to do.  First, the cost and resource consumption of database servers (or virtual servers) requires ops gating their deployment.  Second, the challenge of setting up databases in a way that protects against data loss is a blocker for motivated dev teams.

Much Easier HA

I spent a long time working on fully automated HA systems for PostgreSQL.  I found that, for HA to work, I needed multiple services outside PostgreSQL itself:

  • a "source of truth" to prevent split-brain;
  • something to ensure that the minimum number of replicas were running;
  • routing to the current master that can be quickly changed;
  • a way to enforce shutdown of rogue nodes;
  • a UI to view all of the members of my cluster.

Kubernetes supplies all of the above for you using various objects and tools.  This means that the automatic HA code that needs to run in the Postgres containers can be very simple, and substantially more reliable.  Today, I can't imagine trying to implement database high availability without it.

Zero-Brain-Cell Admin for Small Databases

In a presentation by Peter van Hardenberg, he pointed out that the median size of Heroku's hundreds of thousands of PostgreSQL databases was less than 1GB.  Databases this size also usually support a single application.  This is the size of database you must automate; no human being should be spending time and effort administrating an individual database that fits on a 2005 USB key.

In an orchestrated environment, it becomes much easier to treat the large number of tiny databases needed in your ecosystem as the replaceable cogs they are.  Yes, you can automate your database management entirely with configuration management systems, but it's both easier and more reliable to do it in a container cloud.  Kubernetes helps you to not think about your less critical databases, so that you can spend your time on the databases that really need mollycoddling.

Not for Everyone

Of course, as with any change of infrastructure, there are downsides to moving databases to Kubernetes.  These are serious and will be the reasons why some users and admins will stick to the techniques they know.  I'll explore those in my next post.

No comments:

Post a Comment