Tuesday, May 21, 2013

PostgreSQL New Development Priorities 5: New User Experience

So, I started this looking for our five major goals for future PostgreSQL develoment.  The last goal is more nebulous, but I think equally important with the other goals.  It's this: improve the "new user experience".

This is not a new goal, in some ways.  Improving installation, one of our previous 5 goals, was really about improving the experience for new users.  But the new user experience goes beyond installation now, and competition has "raised the bar".  That is, we matched MySQL, but now that's not good enough; we need to match the new databases.   It should be as easy to get started on a dev database with PostgreSQL as it is with, for example, Redis.  Let me give you a summary of the steps to get up, running, and developing an application in the two platforms:

  1. install Redis, either from packages or multiplatform binaries.  No root access is required for the binaries.
  2. read a 1-page tutorial
  3. run redis-server
  4. run redis-cli or install drivers for your programming language
  5. start developing
  6. when your app works, deploy to production
  7. in production, tune how much RAM Redis gets.
  1. install PostgreSQL from packages or the one-click installer.  Root/Admin access is usually required.
  2. search the documentation to figure out how to get started. 
  3. figure out whether or not your packages automatically start Postgres.  If not, figure out how to start it.  This may require root access.
  4. Install drivers for your programming language.
  5. Figure out how to connect to PostgreSQL.  This may require making changes to configuration files.
  6. Read more pages of documentation to learn the basics of PostgreSQL's variety of SQL, or how to program an ORM which works with PostgreSQL.
  7. Start developing.
  8. Deploy to production.
  9. Read 20 pages of documentation, plus numerous blogs, wiki pages and online presentations in order to figure out how to tune PostgreSQL.
  10. Tune PostgreSQL for production workload.  Be unsure if you've done it right.
The unfortunate reality is that a new user will hit a lot of points in the "getting to know PostgreSQL" where they can be stuck, confused, and at a loss.  At those points, they may decide to try something else, and never come back.  I've seen it happen; just last SFPUG I was talking to a guy who started on Postgres, ran into a shared memory issue, switched to Mongo, and didn't come back to Postgres for 2 years.

So, what can we do about it?  Well, a few things:
  • better new user tutorials, such as the ones on postgresguide.org
  • better autotuning, made a lot easier to implement as of version 9.3.
  • a "developer mode PostgreSQL"
The last would be a version of PostgreSQL which starts when the developer opens a psql prompt, shuts down when they exit, starts with minimal processes and crash safety turned off, and above all with a security configuration which allows that user to immediately connect to PostgreSQL without figuring anything else out.  With some of the work on recovery mode supplying a single-user Postgres, this should become easier, but it needs a lot more work.

Those are the five things I can see which would greatly expand the market for PostgreSQL and keep us competitive against the new databases.  Yes, I'm talking really big features, but any two out of the five would still make a big difference for us.  There may be others; now that you've seen the kind of big feature I'm talking about, put your suggestions below.


  1. Would it help the user experience if error messages contained http links to the Postgres documentation? For example, if a user tries to connect to a database, but gets the "no entry in pg_hba.conf" error, would it help to include a link to http://www.postgresql.org/docs/9.2/static/auth-pg-hba-conf.html? You could do this for SQL syntax errors as well, providing links to the documentation for DML.

    I understand that you're seeking more than just better error messages, but this might make things better.

  2. I'd like to see what fruits are lower-hanging. SQL has an undeserved reputation for being difficult, a driving force behind many of the ORMs. Maybe something that helps explain it in clearer terms...

  3. User experience is a tough one. But I also do think a low barrier to entry is related to adoption.

    When you are just starting you go with what is easy and works out of the box. Then as you grow you tend to stay with it even if its not the "best" tool for the job. The MySQL scale-out stories are often along these lines, especially historically when MySQL was not as robust.

    Without getting into what is right the user account configuration stories are generally simpler in new databases.

    The lowest barrier to startup I can think of easily is a quickstart option that would run under any account (including root) without complaint, bind to a non-privileged port, allow all local connections and have a blank postgres password.

    The blank password I've noticed speeds up quickstart tooling integration. If you run phpPgAdmin / pgAdmin or anything else, they can auto-check the port / postgres / blank password combo and also "just work" out of the box, no mucking around in configuration files.

    It seems like a trivial difference to folks with experience getting things going, but ages ago in another life I tried postgresql and mysql, and ended up going with mysql because it "just worked" when we built our "test project" over a weekend that then grew of course into a multi-year project with a scale out story. If I'm not mistaken MySQL at the time didn't just have blank passwords for their root user but may also have had anonymous user super user access? I just remember the default as being essentially no security.

    I also have to agree with David, SQL isn't that bad. What's odd is that it can take more work to understand how an ORM will join something up then to code the joins and relationships directly. With the exception of projects with very broad adoption (django etc) I wonder if for many projects being database "agnostic" actually is a value add, it's still hard to move databases and you generally can't exploit database features fully and remain agnostic.

    Very much enjoying these posts

  4. Here are two of the priorities that I would love to see be done in the future PG releases.

    I'm an Oracle DBA looking for a way to migrate some of our databases to PostgreSQL. I truly love PG, but, and this is a big but for us, our databases store personal data, hence auditing is a must have feature that PG is completely missing. Built-in AUDIT statement that can track insert,update,delete *and* selects would be a good starting point, but having something like Oracle Fine Grain Auditing in PG would take our socks off...a killer feature.
    I believe that a complete lack of AUDIT capabilities in PG is certainly a show stopper for many enterprises that are obligated to keep auditing trail. (Btw. coding our own triggers is out of the question for several reasons, not to mention that I don't see an easy (easy for a DBA) solution to track select statements with values passed as bind variables in PG?)

    Second show stopper is "weak" (elementary?) backup & recovery tools available in PG. We can't really treat pgdump as a backup tool that we would want to use for any db larger than a few gigabytes; at least not on a daily basis. PG needs true full/incremental *binary* backups and recovery (comparable to Oracle RMAN); if we changed today five blocks out of fifty, we want to backup just those five blocks at night, nothing else. We're generating a half a terabyte of redo logs each day, still nightly incremental backup produces a couple of gigabytes incremental backup file at best.We could not afford to apply 2.5TB WAL logs generated during weekdays on top of a base backup - if we know that all that is really needed is applying 10GB from incremental backup. Am I alone on this? Hope not! :)
    Keep up with the good work!!


    1. Huh. How is it you're generating 2.5TB of WAL logs, but that only represents 10GB of actual changes? Are you rewriting the same rows over and over again?

    2. Hi AlesK,
      Regarding auditing feature, you can have a look at E-Maj. It's a pg extension available on pgfoundry, github or PGXN. It allows to keep a trace of all updates (INSERT/UPDATE/DELETE) on set of tables, with some rollback capabilities. It is already used by some large organizations on test or production databases. A plugin for phpPgAdmin also provides a nice GUI for E-Maj administration.
      If you give it a try, I would love to get your feedback on what improvements would be helpful for your needs, if any. (email address at the end of the extension README file).
      Philippe Beaudoin.

    3. Thanks for a reply Josh!

      Yes, "Huh" indeed :). When I first start working for a client I was overwhelmed by their ETL process (if I can name the process as such!?) - 2.5TB (on three production servers, not one!) of logs are generated on ~20 schemas. A lot of data matching and data imputation is going on each day with SAS, then based on "what-if" analysis statisticians decide when the data is good enough, so yes many updates (often repeated) are running against relatively small data sets until our "statisticians" are satisfied withe the results. Of course, not all indexes could be disabled before each update runs, so Oracle is generating undo + redo for index entries as well...the net results is that at the peak time I get 1GB redo generated / per minute.
      Not a typical RDBMS workload, I guess. We're not dealing with OLTP at all. I'm sure we could cut redo in half if we could rebuild (rethink) the system from scratch -- I hope Oracle will follow PG some day and introduce unlogged tables (even better if we could get off the Oracle bandwagon someday).
      Nevertheless, I can't imagine doing recovery with PostregreSQL on Friday, restoring Sunday backup and then rolling forward with 2+TB (or even half of that) of logs!?

      Hi, beaud76

      thanks, I'll take a look. However, keeping audit trail for all SELECT's on tables with personal data (national wide id number, name, tax id, address) of is a must have feature for us. When we get auditor the first question is: "Can you tell me who was looking at my data and when?". Having audit trail about I/U/D is unfortunately not enough. We have to provide the trail who browsed what, when. Having full select statement with bind variable value in a trace is essential for us. That's why I'm stuck with Oracle for now. :-(

  5. Mongo and Postgres are very different beasts.
    If you use Postgres it's because you need a relational database otherwise there are other maybe simpler choices.
    Compared to Oracle, Postgres is far easier to install and use and was one of the reasons we switched from Oracle to Postgres and never come back.
    If you are a serious developer you owe to yourself and your customers to know your database very well. It's part of what makes you a professional.
    I understand many people don't write ERP programs nowadays but I think the other four priorities cited are by far more important.

  6. you mean www.postgresguide.com - not .org