Monday, March 11, 2013

PostgreSQL's New Development Priorities: Part I

Six years ago, I came to the first Developer Meeting for PostgreSQL in Ottawa -- it wasn't even called that at the time, we just sort of pulled a bunch of people together -- with a list of the five big features which PostgreSQL was missing.  We'd just eliminated a big obstacle to Postgres adoption by creating the native Windows port, and these were the remaining five technical issues which most frequently caused people to evaluate PostgreSQL and decide not to use it.  These were:
  1. Difficulty of install
  2. Lack of built-in simple replication
  3. Performance for web applications
  4. Driver quality
  5. Difficult configuration
As you can tell, a lot of the above was taken from interviews with MySQL users, but they were all legitimate issues.  Thing is, the PostgreSQL community in versions 8.4 through 9.2 pretty much eliminated the first three issues, the fourth was addressed by Rails, PHP and Python without any help from us, and the last item is well on its way to being addressed in 9.3.    There's still stuff left to be done; better autoconfiguration utilities, better OSX install, LinQ driver for .NET, etc., but as top priorities the above issues are largely under control or in the process of being fixed.

So what's next for PostgreSQL development?  What do we need to do, technically, to continue as the #1 open source database?  I plan to address this in a series of upcoming blog posts.


(note: I'm aware that a large chunk of database adoption is driven by non-techical issues.  However, that's not what I'm addressing at this time.)

17 comments:

  1. You could argue, it has come full circle. It's the same five points, but set against today's standards.

    ReplyDelete
    Replies
    1. I'll be arguing that our chief priorities should be different.

      Delete
  2. point 1. isn't issue at all.

    Point 2. Very important for all use cases. So it should be very clear, how to proceed for new users (previous mysql users LOL). For now it looks like there are to many options available. Good thing, but one don't really know what to choose. Built-in, slony, pg-pool, etc. Let's provide simple use cases in documentation. For example:

    1. Website with several web-servers + DB with master + slave
    2. Website with several web-servers + DB with master + master
    3. Website with several web-servers + multiple masters and slaves or one master and multiple slaves. How to spread reads and writes?

    How to setup? How to ensure website is still working, if master is down and how to recover in this scenario. That's what typical website want. Sounds simple of course...




    ReplyDelete
  3. Could you please clarify whether "Difficulty of install" referred only to Windows or was it on all platforms and if the latter, what was done between 8.4 and 9.2 on that front?

    ReplyDelete
    Replies
    1. Well, when I stated the problem, it was actually 8.2. At that time, installing the latest version of Postgres was a problem on a variety of platforms: Windows, OSX desktop, Solaris, SuSE, and Debian. Source installs on many platforms were fairly common, because current packages were hard to come by or weren't updated regularly.

      Windows and OSX were taken care of by the One-Click Installer -- Dave's team at EnterpriseDB. We added Solaris packages while I was at Sun and those are still maintained. SuSE created the Build Service without our help, and Ubuntu/Debian was first addressed with Martin's backports and is now being tackled by apt.postgresql.org.

      Delete
  4. I think partitioning should be the focus of the next round. Oracle has a huge head-start there and MySQL 5.6 also improved that part.

    Looking at the ever increasing size of the databases this is going to be a crucial feature. Not only the easy of definition but also getting the optimizer to make the best use of it.

    I also see parallel execution of queries to be a feature that will be necessary to keep up with the "big boys". If you have the hardware parallel execution really gives large queries a substantial boost in Oracle

    ReplyDelete
  5. I agree with Tom, also automatically clustered indexes (a.k.a. index organized tables) help with reporting type queries on larger tables.
    Columnwise storage, which is done by a few Postgres derivatives, is good for that workload too.
    And for non-trivial topologies (which I think are still quite common, for example, an operational database with a data warehouse) logical log-based replication (like MS SQL has) with options for replicating just a part of the tables of the database, or replicating only inserts and updates, but no deletes could be useful.

    ReplyDelete
  6. mmh interresting :) ..
    i was thinking about :
    * Easier setup of full text search (including accents, fuzzy match, ...)
    * More love in Desktop clients : need more desktop integration , a design and deploy app, ...
    * more json integration (convert from and to any type, allow manipulations and query, ... )
    * A more plug and play replication : Client or server has a list of server and can switch from one to another w-o changing anything.
    * Ability to split a query across replicate or cpus

    ReplyDelete
  7. 1.Better Partitioning support
    2.Parallel Query Execution
    3.Better connection pooling

    1 & 2 are fairly self explanatory, 3 is problematic as it forces you to keep the number of actual DB users fairly low in order to keep connection count down. This is a key blocker in a lot of mulch-tenancy systems.

    ReplyDelete
  8. I'll agree with the others about partitioning and parallelism, which could be grouped as the buzzword-filled phrase, "Scalable Data Warehousing."

    ReplyDelete
  9. Definitely, checksum checks are desperately needed by enterprise users. Patches like the "Checksums (initdb-time)" patch should be prioritized [ https://commitfest.postgresql.org/action/patch_view?id=993 ] during new version development.

    User friendly partitioning support would be great too as would parallel query execution.

    ReplyDelete
    Replies
    1. We could use some help on performance testing/tuning the current checksum patch.

      Delete
  10. Better Partitioning is the most important thing

    ReplyDelete
  11. zero downtime upgrades?

    Perhaps using replication like Slony supports them?

    ReplyDelete

  12. Home Wellbeing has a wide range of One Stop Home Essentials products that care for the wellbeing of You and Your Loved Ones.

    ReplyDelete
  13. Hi there, Josh!
    This post is interesting. I was actually caught by the first problem that you have mentioned, when it is difficult to install. I have a not so big background about database and stuff and sp I believe that these information will be very useful.

    ReplyDelete