Database Soup

Thursday, July 18, 2013

Dimitri talks about Skytools tonight at SFPUG

Tonight's SFPUG will feature visiting contributor Dimitri Fontaine talking about Skytools3. Assuming we can make it work, live video will be available on the SFPUG video channel starting at 7:20PM PDT.

Tuesday, July 16, 2013

Accidental DBA and More at OSCON next week

O'Reilly OSCON is next week, and I'll be attending for the first time since 2009. I'll be teaching the Accidental DBA tutorial, for which I've just uploaded the preparation materials. There's also lots of other PostgreSQL and community-building content, much of which I'm involved in.

As an experiment, I'm trying to do hands-on for this tutorial by using Vagrant, which will provide each attendee with a nice VM they can use to try out various PostgreSQL administrative commands. Of course, this means that attendees need to download, install, and run setup on Vagrant before the tutorial, and preferably while they are still at home with good internet access. I've a feeling that 20 people will want to try to install it the morning of the tutorial, which won't work. I'm also conflicting with "predictive R analytics", dammit.

If you're attending the tutorial, preparation materials are here.

As usual, there will be a significant amount of PostgreSQL content aside from my own tutorial, which includes:

Christophe Pettus doing PostgreSQL as a NoSQL Database
Dimitri Fontaine doing Large Scale MySQL Migration to PostgreSQL
A PostgreSQL booth in the Trade Show
PDXPUG event, including me and Dimitri speaking, and beer, on Wednesday night.
There's also been some talk about a Big Data event Tuesday night.

Of course, my main reason for going to OSCON is for the community-building stuff. I'll be involved in the Community Leadership Summit, where I will be giving a mini-talk called "Fundraising 101" on Saturday afternoon. Then I'll be involved in both of these two sessions:

So, if you're going to be at OSCON, I'll no doubt see you there!

Tuesday, July 9, 2013

Kyoto Tycoon FDW for Postgres

The cool folks at Cloudflare, next-generation CDN for web applications, have created a Foreign Data Wrapper (FDW) for Kyoto Tycoon. This lets them distribute cache content across their network while still having transactional control from within PostgreSQL. Read more on their blog post on the Tycoon FDW.

Connection Pooling vs. Persistent Connections

The distinction between connection pooling and persistent connections recently came up again with a client, and I realized that there's a significant amount of confusion in the industry due to deceptive feature naming by driver and ORM authors. To set the record straight:

Persistent Connections are a feature of many drivers, driver frameworks and ORM tools, such as psycopg2, PDO, Hibernate and SQLObject, in which the driver keeps connection handles open for reuse by the same process or thread the next time it needs database access. This reduces latency from database authentication and connection startup time.

Connection Pooling is a feature of network proxies and application servers, such as pgBouncer and various J2EE servers, in which a "pool" of persistent connections to the database are maintained and shared among multiple application servers, often allowing more than one application process to share the same database connection. This not only decreases connection startup time, but allows for better management of a reduced number of database server connections, lowering the load on the database server.

So, why the confusion? Well, in an instance of inflationary marketing, many driver/ORM authors have been calling their persistent connection feature "connection pooling". No doubt this is out of a desire to compete with J2EE or other alternative platforms (certainly, in the case of the many Tomcat "poolers", it is), but the result is that users think they have connection pooling and are surprised to get this error message:

   FATAL connection limit exceeded non-superusers

Here's a tip: if the tool you're using doesn't come with configuration parameters for total number of database connections and database connection lifetime, or if it doesn't run a separate service, it's probably not a real connection pool.

Thursday, June 27, 2013

Salesforce, PostgreSQL and Bruce Momjian

Given Bruce's blog post today, I wanted to make a few things clear which might not be at all clear to folks outside our project:

Bruce is indulging in personal speculation, not speaking for the core team.
Bruce is not one of the core team members who deals with Salesforce, so he has no knowledge of their plans, strategy, or expected contributions to PostgreSQL.

In other words, Bruce's entire post is pure speculation, without any inside knowledge of any kind.

Unfortunately, the members of our team who do deal with Salesforce regularly are prohibited by NDA from discussing what their plans might be.

Tuesday, June 18, 2013

Maximum Importance Aggregation

You already know that I'm a big fan of custom aggregates. What's even more fun is when you combine them with the ordered aggregate feature from PostgreSQL 9.0. For example, let's take a fairly common task: selecting the item with the maximum importance.

This is a challenge for standard aggregation because you want to display a value which relates to an importance score contained in another table. For example, the table could be a lookup list of error types, and you want to display the most severe error. This is simple to do if you only want one row, but what if you want to group by other columns? Well, here's a relatively neat way using ordered aggregates and custom aggregates.

First, we want to create a generic aggregate called "first", which simply returns the first non-null value in the column:

    CREATE FUNCTION first_state (
        anyelement, anyelement )
    RETURNS anyelement
    LANGUAGE SQL
    IMMUTABLE STRICT
    AS $f$
        SELECT $1;
    $f$;

    CREATE AGGREGATE first ( anyelement ) (
            SFUNC = first_state,
            STYPE = anyelement
    );

This is a generally useful aggregate to have around, btw, so I suggest adding it to your general library.

Now, let's look at our tables:

    table errors (
        error_code text primary key,
        severity int not null default 0,
        description text
    )

    table error_log (
        lo_time timestamptz,
        error_code text
    )

So, say I wanted to see the count of errors for each hour in the log and the most severe error during that hour? Well, what I need to do is to select the first error code which shows up when I order by severity. For good measure, I order by error_code as well, in case there's two codes with the same severity:

    SELECT date_trunc('hour',log_time) as log_hour,
        count(*) as error_count,
        first(error_code order by severity desc, error_code)
            as most_severe
    FROM error_log
        JOIN errors USING (error_code)
    GROUP BY date_trunc('hour',log_time)
    ORDER BY date_trunc('hour',log_time);

And that works a treat:

           log_hour        | error_count | most_severe
    -----------------------+-------------+-------------
    2013-06-17 00:00:00+00 |           6 | CX56B
    2013-06-17 01:00:00+00 |          16 | CX56B
    2013-06-17 02:00:00+00 |           4 | PPPMQ5
    2013-06-17 03:00:00+00 |           2 | PPPMQ5
    2013-06-17 04:00:00+00 |          24 | CX56B
    2013-06-17 05:00:00+00 |          15 | CX56B
    2013-06-17 06:00:00+00 |           5 | FM229A

I'm sure you can figure out ways to adapt this to your needs. Enjoy!

Sunday, June 16, 2013

9.4 Commitfest 1 Has Begun!

The first CommitFest for version 9.4 of PostgreSQL has started, and we have an inspiring list of new features, in 98 patches, ready to be reviewed. Which means we need you to help review them! Among the pending patches are:

Logical Changeset Generation, the basis for the new replication;
Setting postgresql.conf variables from the SQL command line;
Transaction Log size autotuning;
Transforms between custom PostgreSQL datatypes and PL/Python, Perl and Javascript datatypes;
Row-Level Security;
GIN Fast Scan, and other GIN index improvements;
Block-based range indexes for very large tables;
Concurrent update of materialized views;
Multiple pgbench enhancements;
Many performance improvements;
A bunch of additional regression tests.

Given that that's nearly 100 patches, and the PostgreSQL project has only 20 committers, it means we need you to help us review patches. Yes, you! You don't have to be able to hack PostgreSQL code to review a patch; if you can apply a patch to source code and build PostgreSQL using "make", you can help. More information here in How To Review A Patch.

If you want to help, but aren't sure what to review, please join the pgsql-rrreviewers mailing list and announce yourself there. I will assign you a patch to review.

For those of you who are experienced reviewers or who have submitted a patch to this CommitFest, please note the following:

If you submitted a patch to this CommitFest, you are expected to review at least one other person's patch. Should you choose not to do so, the community will not be obligated to guarantee you a review of your patch.
If you put your name down as a reviewer on a patch, and you don't post anything for 5 days, we will send you a reminder and take your name off the patch, to prevent you from blocking other reviewers.
If your patch goes to "waiting on author" and you don't respond for 5 days, the patch will be Returned with Feedback.

Rules 2 and 3 are new per this year's developer meeting. Thanks, and lets have a fast and productive CommitFest!