Thursday, July 12, 2018

Should I Run Postgres on Kubernetes? Part II

Having covered some of the reasons why you would want to run a PostgreSQL on Kubernetes, I'm not going to cover some of the reasons why you wouldn't want to.  This should help you make a balanced judgment about whether or not it's the right next move for you.

Not a Kubernetes Shop

Of course, everything I said about an integrated environment applies in reverse; if you're not already running other services on Kubernetes, or planning to migrate them, then moving your databases to Kubernetes is a lot of effort with minimal reward.  I'd even go so far as to say that your databases should probably be the last thing you move. Stateful services are harder, and require a greater mastery of Kubernetes. Start with the easy stuff.

Also, and perhaps more importantly, nothing in Kubernetes will remove from you (as admin) the necessity of having knowledge of the database platform you're deploying.  While it does make somethings (like HA) easier to automate, it doesn't change the basics of database management.  You still need to have someone on staff who knows PostgreSQL, it's just that that person can manage a lot more databases than they could before.

Everything is Alpha

Like everything in the new stacks, Kubernetes is under heavy development and many of the tools and features you're going to want to use are alpha, beta, or pre-1.0 software.  For folks used to PostgreSQL, where even our daily builds are runnable in production, you're going to find the world of containerish stuff very alpha indeed.

This means that moving your databases (or anything else) to Kubernetes at this stage means embracing the requirement to develop both some Kubernetes expertise, and a tolerance for risk.  Realistically, this means that you're only going to do it today if you have pressing needs to scale your database support and admin team that you can't meet otherwise. Folks in regulated environments with very low tolerance for mistakes should probably be the last to move.

In a year or so, this will change; there is more work today on making things stable in Kubernetes than there is on new features.  But right now you need to balance risk and reward.


Linux containers are just processes, and as a result perform largely like applications on the host OS would perform.  However, there are some areas where running on Kubernetes can add overhead.  Like any other abstraction layer, it's going to be problematic if your database is already your primary performance bottleneck.  These overhead sources include:

  • Extra network hops from connection redirection, leading to increased latency;
  • Performance bottlenecks causes by various network overlays used for virtual networking;
  • Storage latency caused by storage abstractions, depending on your storage system.

That said, this overhead is not any worse (and sometimes considerably better) than running a database on AWS, where the majority of PostgreSQL servers today are hosted.  So it's not a blocker for most users, just the ones who are already performance-constrained and maxxing out their hardware.

Large Databases

However, large database may face more performance and management issues than smaller databases.  Kubernetes scheduling, Linux cgroups, namespace memory management, and other components of the stack are built with the idea that each deployed microservice will be using a minority of the resources on the physical nodes.  When you move over to a database that needs to use the majority of a node's physical resources, you'll find yourself working around a lot of built-in functionality to make it work.

One particular issue is that, if you are using Linux containers, you cannot turn off the Out-Of-Memory Killer.  Veteran PostgreSQL admins will recognize the problem this is for large databases.

Like other limitations, this won't be true forever.  Among other things, every component in Kubernetes is swappable, so you can expect more specialized workload managers in the future.  But right now, if you're doing data warehousing you might do it off Kubernetes or using a dedicated cluster with special settings.

Conclusion and Checklists

To wrap up, you can compare your database use-case with these pluses and minuses to decide whether you should scope out moving your PostgreSQL infrastructure to Kubernetes.  To the degree that automation, self-service, and zero-admin small databases are valuable to you, consider it.  But if stability, marginal performance, and big data analytics are your main requriements, it's maybe not time for your databases yet.

Wednesday, July 11, 2018

Should I Run Postgres on Kubernetes? Part I

In preparation for my workshop on running PostgreSQL on Kubernetes on Monday, I wanted to talk a bit about why you'd want to run your database there in the first place -- and why you wouldn't.

The container world, starting with Docker and then moving to Kubernetes, has focused on stateless services like web applications.  This has been largely because stateless services are simply easier to manage in new environments, and can be handled generically was well, allowing Kubernetes to be relatively platform-agnostic.  Once you get into services that require storage and other persistent resources, the idea of "run your containerized application on any platform" becomes much more challenging.

Somewhere along the way "stateful services are hard" morphed into "you shouldn't run stateful services" as a kind of mantra.  Considering how much work contributors have put into making stateful apps possible, that's simply wrong.  And, for that matter, when has running databases on any stack ever been easy?

Let's start with some of the reasons you would want to run Postgres (or other databases) on Kubernetes.  In tommorrow's post, I'll go into some of the reasons why you would not want to.

One Environment to Rule Them All

The biggest reason is to simplify your development and deployment picture by putting all application components on Kubernetes.  It supplies a whole set of scaffolding to make deploying and integrating applications and databases easier, including shared secrets, universal discovery, load balancing, service monitoring, and scaling.  While there are integration points, like the Service Catalog, that support treating external databases as Kubernetes services, it's always going to be harder to manage a database that has to be deployed by a completely different process from the application it supports.

As a small example, let's take database connection credentials. If both the database and the application are on Kubernetes, rotating/updating credentials is simple: you just update a Secrets object (or a plugin provider like KeyCloak), and both the application and the database will pick up the new logins.  If the database is external, this becomes two separate update processes, each of which uses different tools.

With the integration from locating everything on Kubernetes, setting up other parts of your development infrastructure get much easier.  Deploying a test application to dev or staging can work the same way as it does in prod, minimizing mistakes.  CI/CD can more easily include database updates and upgrades.

Do the DevOps

With this level of integration, and the right coworkers, Kubernetes then enables a "real DevOps workflow" where each development team owns their own database.  In dev and staging environments, and maybe even in production, developers can be given the ability to self-service database support, deploying small replication clusters using predefined templates.

This isn't theoretical; it's what online fashion retailer Zalando is actually doing, allowing them to manage a large company with hundreds of online applications and a handful of database staff.

In traditional -- or even VM cloud -- environments, this is much, much harder to do.  First, the cost and resource consumption of database servers (or virtual servers) requires ops gating their deployment.  Second, the challenge of setting up databases in a way that protects against data loss is a blocker for motivated dev teams.

Much Easier HA

I spent a long time working on fully automated HA systems for PostgreSQL.  I found that, for HA to work, I needed multiple services outside PostgreSQL itself:

  • a "source of truth" to prevent split-brain;
  • something to ensure that the minimum number of replicas were running;
  • routing to the current master that can be quickly changed;
  • a way to enforce shutdown of rogue nodes;
  • a UI to view all of the members of my cluster.

Kubernetes supplies all of the above for you using various objects and tools.  This means that the automatic HA code that needs to run in the Postgres containers can be very simple, and substantially more reliable.  Today, I can't imagine trying to implement database high availability without it.

Zero-Brain-Cell Admin for Small Databases

In a presentation by Peter van Hardenberg, he pointed out that the median size of Heroku's hundreds of thousands of PostgreSQL databases was less than 1GB.  Databases this size also usually support a single application.  This is the size of database you must automate; no human being should be spending time and effort administrating an individual database that fits on a 2005 USB key.

In an orchestrated environment, it becomes much easier to treat the large number of tiny databases needed in your ecosystem as the replaceable cogs they are.  Yes, you can automate your database management entirely with configuration management systems, but it's both easier and more reliable to do it in a container cloud.  Kubernetes helps you to not think about your less critical databases, so that you can spend your time on the databases that really need mollycoddling.

Not for Everyone

Of course, as with any change of infrastructure, there are downsides to moving databases to Kubernetes.  These are serious and will be the reasons why some users and admins will stick to the techniques they know.  I'll explore those in my next post.

Tuesday, April 3, 2018

New Annotated Config Files for PostgreSQL 10

Teal Deer: The Annotated.conf has been updated for PostgreSQL 10, and it's a Github repo now.

13 years ago, for PostgreSQL 8.0, I released the first Annotated.conf because we had (at that time) over 140 configuration settings and most people had no idea how to set them.  Today Postgres has nearly twice as many (269), and the problem is twice as bad. More about that later.

The repository is intended to be a quick reference for how to figure out how to set a lot of these parameters.  It includes my tutorial slides in both Libreoffice and PDF format, and a giant table of settings and recommendations in either CSV format or as a PostgreSQL 10 database dump.  Accompanying each setting are my notes on why you'd change the setting and, if so, to what.

Probably more usefully, I've included two sample .conf files.  postgresql.10.simple.conf contains the 20 most commonly changed settings with detailed advice on how to set them.  extra.10.conf has the next 20 most-likely-to-be-changed settings, with somewhat more limited advice.  The idea is for you follow the instructions, and then drop one or both of these files into your include_dir configuration directory and have a more sensible configuration without messing with the preposterously long default configuration file that ships with most PostgreSQL packages.  Those 20 to 40 settings should cover 90% of the needs of 90% of users.

If you feel that my advice is inaccurate, incomplete, or out-of-date, then Github has both Issues and Pull Requests available for you to suggest replacements.

Now, I can't help but feel that the configuration situation with PostgreSQL is getting worse.  While the database has improved with some better sensible defaults, and some good auto-configuration code (for example, for transaction log size), overall both the number of settings and the number of settings most users have to care about keeps going up.

The worst part of this is settings added with absolutely no information on how to determine a reasonable level for the setting.  For example, these four data flushing settings were added in version 10:


... but there is no concrete advice on how to set these. Yes, there are docs, but even as someone who has a solid grounding in database, memory, and filesystem performance, I'd hesitate to recommend any specific level for these for a specific problem.  I'd like to be optimistic and assume that an explanation is coming in Postgres 11 or so, but after four years there's still nothing on how to set vacuum_multixact_freeze_min_age.

In the meantime, hopefully annotated.conf gives you a good start on tinkering with your own PostgreSQL settings.

Monday, January 29, 2018

LWKD Has a New Home!

Since this isn't a particularly good place for it, I've moved Last Week in Kubernetes Development to its new home at  That includes this week's edition of LWKD.

Part of the idea of the move is to be able to accept contributions to the publication.  Since it's on Github pages, I can accept them through the LWKD git repo.  I particularly could use some help setting up an RSS feed in some way that doesn't require restructuring the site around a static site generator.

Also, a guest writer for next week would be welcome; otherwise the next issue is likely to be light due to personal travel.

Monday, January 22, 2018

Last Week in Kubernetes Development: Week ending January 21

Community Meeting Summary

The demo for this week's meeting was Kubernetes running on Docker For Mac. The folks have been hard at work enabling this, and the demo now looks pretty polished. Developers with Mac desktops should be able to easily use Kubernetes with their existing Docker for Mac workflows.

Jaice DuMars explained the delay of the alpha release and gave a 1.10 stats update (see below). Dan Williams updated folks on SIG-Network, who have been making a lot of changes, including adding IPv6 support. Phillip Wittrock updated everyone on what the Steering Committee is currently working on, especially creating template SIG charters so that all SIGs can create their own charters. If you have opinions about the organization and leadership of your SIG, please take the survey on SIG governance.

Kubernetes will be participating in Google Summer of Code with the CNCF this year. Please contact Ihor Dvoretskyi if you are interested in mentoring or know a student. SIG Intros and Deep Dives at KubeCon Europe will be announced soon. The project will have another "Meet Our Contributors" on February 7th, this one focused on helping out new contributors (contact Paris Pittman to participate).

The format for the Community meeting will also be changing slightly in the future. SIGs will be scheduled for updates per release cycle instead of ad hoc, and demo speakers will be asked to rehearse before the meeting.

Release Schedule

This was week 3 of version 1.10 development. This week should have included an early alpha release, mainly as a dry run for release packaging. However, it's been delayed because Branch Manager Caleb Miles had a painful bike accident and has been offline. An alpha release is expected this week.

Feature Freeze, which was supposed to be January 22nd, has also been delayed by one week because the Features Lead is still waiting for status clarification on some features from several SIGs. Final Feature Freeze deadline will now be on the 29th. Many SIGs have updated their features, though, and Ihor has created the Feature Tracking Board for version 1.10.

Feature Work

While 148 patches were merged last week, most of them were minor bug fixes (including at least ten for GCE support), cherry-picks for copying fixes across releases, typo fixes, and some doc and release note corrections. Among the interesting feature work was:




Version Updates


Other Merges


Graph of the Week


This week's graph, brought to you by Jorge Castro, is Approvers and the Approvers Histogram This graph shows you the number of pull request approvals in each repository, and the accompanying histogram shows you who did those approvals. Looks like @cblecker is our leading approver for last month.

Monday, January 15, 2018

Last Week in Kubernetes: Week ending January 14th

With several hundred active contributors, it's pretty hard to keep track of Kubernetes development.   It's hard for me, and I'm paid to keep track; I can't imagine that anyone else can do it, even if they contribute to Kubernetes.

What follows is an experimental publication.  I'm thinking of doing a development summary, every week or so, of what's happened in new features, deprecations, the community meeting, and more.  Tell me if this is useful to you.  If it is, I'll look at finding an official place to publish it where maybe other community members can contribute.

Last Week in Kubernetes: Week ending January 14, 2017

Community Meeting Summary

The community meeting was dominated by a discussion around whether all repos in the kubernetes namespace should be a part of the same automation, particularly merge automation.  Aaron Crickenberger (spiffxp) has been offering this to other repos in the Kubernetes namespace, but some teams, particularly Helm, are concerned about unexpected changes this might cause.  One goal of getting all repos on the same automation is to retire mungegithub.

Jacob Pavlik demonstrated the KQueen cluster manager.  Jaice DuMars went over release 1.10, which is in week 2 of 12 and will go into Feature Freeze on January 22nd, so get your features in!  And SIG Azure and SIG Node made reports.

Feature Work

Configurable Pod Process Namespace Sharing prepared for inclusion in 1.10 this week with the addition of a feature flag for PID namespace sharing. The --docker-disable-shared-pid was also removed from kubelet.

Kubelets can now be run in containers, allowing for a completely containerized Kubernetes install. Such installs are now passing e2e tests.

Support for raw block devices as persistent volumes moved ahead with the merge of iSCSI support for block volumes.


Docker 1.10 is no longer supported. The minimum docker version is now 1.11. While docker 1.10 was officially deprecated in release 1.9, the compatibility code has now actually been completely removed.

sig-cluster-lifecycle is gradually deprecating the /cluster directory in favor of having these cluster setup tools maintained outside of kubernetes/kubernetes. In 1.10, that will include removing the windows/, photon-controller/, libvirt-coreos/, and gce/container-linux/ subdirectories, with more to be removed in future releases.

Version Updates


Other Merges