Database Soup: June 2013

Thursday, June 27, 2013

Salesforce, PostgreSQL and Bruce Momjian

Given Bruce's blog post today, I wanted to make a few things clear which might not be at all clear to folks outside our project:

Bruce is indulging in personal speculation, not speaking for the core team.
Bruce is not one of the core team members who deals with Salesforce, so he has no knowledge of their plans, strategy, or expected contributions to PostgreSQL.

In other words, Bruce's entire post is pure speculation, without any inside knowledge of any kind.

Unfortunately, the members of our team who do deal with Salesforce regularly are prohibited by NDA from discussing what their plans might be.

Tuesday, June 18, 2013

Maximum Importance Aggregation

You already know that I'm a big fan of custom aggregates. What's even more fun is when you combine them with the ordered aggregate feature from PostgreSQL 9.0. For example, let's take a fairly common task: selecting the item with the maximum importance.

This is a challenge for standard aggregation because you want to display a value which relates to an importance score contained in another table. For example, the table could be a lookup list of error types, and you want to display the most severe error. This is simple to do if you only want one row, but what if you want to group by other columns? Well, here's a relatively neat way using ordered aggregates and custom aggregates.

First, we want to create a generic aggregate called "first", which simply returns the first non-null value in the column:

    CREATE FUNCTION first_state (
        anyelement, anyelement )
    RETURNS anyelement
    LANGUAGE SQL
    IMMUTABLE STRICT
    AS $f$
        SELECT $1;
    $f$;

    CREATE AGGREGATE first ( anyelement ) (
            SFUNC = first_state,
            STYPE = anyelement
    );

This is a generally useful aggregate to have around, btw, so I suggest adding it to your general library.

Now, let's look at our tables:

    table errors (
        error_code text primary key,
        severity int not null default 0,
        description text
    )

    table error_log (
        lo_time timestamptz,
        error_code text
    )

So, say I wanted to see the count of errors for each hour in the log and the most severe error during that hour? Well, what I need to do is to select the first error code which shows up when I order by severity. For good measure, I order by error_code as well, in case there's two codes with the same severity:

    SELECT date_trunc('hour',log_time) as log_hour,
        count(*) as error_count,
        first(error_code order by severity desc, error_code)
            as most_severe
    FROM error_log
        JOIN errors USING (error_code)
    GROUP BY date_trunc('hour',log_time)
    ORDER BY date_trunc('hour',log_time);

And that works a treat:

           log_hour        | error_count | most_severe
    -----------------------+-------------+-------------
    2013-06-17 00:00:00+00 |           6 | CX56B
    2013-06-17 01:00:00+00 |          16 | CX56B
    2013-06-17 02:00:00+00 |           4 | PPPMQ5
    2013-06-17 03:00:00+00 |           2 | PPPMQ5
    2013-06-17 04:00:00+00 |          24 | CX56B
    2013-06-17 05:00:00+00 |          15 | CX56B
    2013-06-17 06:00:00+00 |           5 | FM229A

I'm sure you can figure out ways to adapt this to your needs. Enjoy!

Sunday, June 16, 2013

9.4 Commitfest 1 Has Begun!

The first CommitFest for version 9.4 of PostgreSQL has started, and we have an inspiring list of new features, in 98 patches, ready to be reviewed. Which means we need you to help review them! Among the pending patches are:

Logical Changeset Generation, the basis for the new replication;
Setting postgresql.conf variables from the SQL command line;
Transaction Log size autotuning;
Transforms between custom PostgreSQL datatypes and PL/Python, Perl and Javascript datatypes;
Row-Level Security;
GIN Fast Scan, and other GIN index improvements;
Block-based range indexes for very large tables;
Concurrent update of materialized views;
Multiple pgbench enhancements;
Many performance improvements;
A bunch of additional regression tests.

Given that that's nearly 100 patches, and the PostgreSQL project has only 20 committers, it means we need you to help us review patches. Yes, you! You don't have to be able to hack PostgreSQL code to review a patch; if you can apply a patch to source code and build PostgreSQL using "make", you can help. More information here in How To Review A Patch.

If you want to help, but aren't sure what to review, please join the pgsql-rrreviewers mailing list and announce yourself there. I will assign you a patch to review.

For those of you who are experienced reviewers or who have submitted a patch to this CommitFest, please note the following:

If you submitted a patch to this CommitFest, you are expected to review at least one other person's patch. Should you choose not to do so, the community will not be obligated to guarantee you a review of your patch.
If you put your name down as a reviewer on a patch, and you don't post anything for 5 days, we will send you a reminder and take your name off the patch, to prevent you from blocking other reviewers.
If your patch goes to "waiting on author" and you don't respond for 5 days, the patch will be Returned with Feedback.

Rules 2 and 3 are new per this year's developer meeting. Thanks, and lets have a fast and productive CommitFest!

Wednesday, June 5, 2013

First-ever pgCon Unconference: Success

Our first-ever unconference at pgCon was a smashing success this year, greatly enhancing collaboration and development for the upcoming versions of PostgreSQL. Among other things, I think we got a specification for pluggable storage out of it, which is far beyond anything I'd expected. Were you there? How did you like it? Give your feedback in the comments.

About 75 people attended, and hashed various topics out in 12 discussion sessions. These included expanding PostgreSQL testing, creating demo databases, enhancing full text search, upcoming JSON features, pg_upgrade, and of course the two-part Pluggable Storage/Foreign Data Wrapper discussion. Anyway, it's all on the wiki if you missed it.

We're definitely doing one next year, so plan to stay an extra day at pgCon.

I thank the following people for making the first unconference a success: Salesforce.com, our unconference sponsor; Stacey Haysler, Nikhil Sontakke, Susanne Ebrecht, Hartmut and Ian Barwick for helping to run the event; and Dan Langille, conference organizer. Thank you folks!

Yes, I know the Unconference was almost two weeks ago, but I lost my phone in Ottawa and this is the first I could get pictures up.