Thursday, July 12, 2018

Should I Run Postgres on Kubernetes? Part II

Having covered some of the reasons why you would want to run a PostgreSQL on Kubernetes, I'm not going to cover some of the reasons why you wouldn't want to.  This should help you make a balanced judgment about whether or not it's the right next move for you.

Not a Kubernetes Shop


Of course, everything I said about an integrated environment applies in reverse; if you're not already running other services on Kubernetes, or planning to migrate them, then moving your databases to Kubernetes is a lot of effort with minimal reward.  I'd even go so far as to say that your databases should probably be the last thing you move. Stateful services are harder, and require a greater mastery of Kubernetes. Start with the easy stuff.

Also, and perhaps more importantly, nothing in Kubernetes will remove from you (as admin) the necessity of having knowledge of the database platform you're deploying.  While it does make somethings (like HA) easier to automate, it doesn't change the basics of database management.  You still need to have someone on staff who knows PostgreSQL, it's just that that person can manage a lot more databases than they could before.

Everything is Alpha


Like everything in the new stacks, Kubernetes is under heavy development and many of the tools and features you're going to want to use are alpha, beta, or pre-1.0 software.  For folks used to PostgreSQL, where even our daily builds are runnable in production, you're going to find the world of containerish stuff very alpha indeed.

This means that moving your databases (or anything else) to Kubernetes at this stage means embracing the requirement to develop both some Kubernetes expertise, and a tolerance for risk.  Realistically, this means that you're only going to do it today if you have pressing needs to scale your database support and admin team that you can't meet otherwise. Folks in regulated environments with very low tolerance for mistakes should probably be the last to move.

In a year or so, this will change; there is more work today on making things stable in Kubernetes than there is on new features.  But right now you need to balance risk and reward.

Performance


Linux containers are just processes, and as a result perform largely like applications on the host OS would perform.  However, there are some areas where running on Kubernetes can add overhead.  Like any other abstraction layer, it's going to be problematic if your database is already your primary performance bottleneck.  These overhead sources include:

  • Extra network hops from connection redirection, leading to increased latency;
  • Performance bottlenecks causes by various network overlays used for virtual networking;
  • Storage latency caused by storage abstractions, depending on your storage system.

That said, this overhead is not any worse (and sometimes considerably better) than running a database on AWS, where the majority of PostgreSQL servers today are hosted.  So it's not a blocker for most users, just the ones who are already performance-constrained and maxxing out their hardware.

Large Databases


However, large database may face more performance and management issues than smaller databases.  Kubernetes scheduling, Linux cgroups, namespace memory management, and other components of the stack are built with the idea that each deployed microservice will be using a minority of the resources on the physical nodes.  When you move over to a database that needs to use the majority of a node's physical resources, you'll find yourself working around a lot of built-in functionality to make it work.

One particular issue is that, if you are using Linux containers, you cannot turn off the Out-Of-Memory Killer.  Veteran PostgreSQL admins will recognize the problem this is for large databases.

Like other limitations, this won't be true forever.  Among other things, every component in Kubernetes is swappable, so you can expect more specialized workload managers in the future.  But right now, if you're doing data warehousing you might do it off Kubernetes or using a dedicated cluster with special settings.

Conclusion and Checklists


To wrap up, you can compare your database use-case with these pluses and minuses to decide whether you should scope out moving your PostgreSQL infrastructure to Kubernetes.  To the degree that automation, self-service, and zero-admin small databases are valuable to you, consider it.  But if stability, marginal performance, and big data analytics are your main requriements, it's maybe not time for your databases yet.

No comments:

Post a Comment