Database Soup: errors

Thursday, April 16, 2015

Expressions VS advanced aggregates

So ... you're using some of 9.4's new advanced aggregates, including FILTER and WITHIN GROUP. You want to take some statistical samples of your data, including median, mode, and a count of validated rows. However, your incoming data is floats and you want to store the samples as INTs, since the source data is actually whole numbers. Also, COUNT(*) returns BIGINT by default, and you want to round it to INT as well. So you do this:

    SELECT
        device_id,
        count(*)::INT as present,
        count(*)::INT FILTER (WHERE valid) as valid_count,
        mode()::INT WITHIN GROUP (order by val) as mode,
        percentile_disc(0.5)::INT WITHIN GROUP (order by val)
          as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

And you get this unhelpful error message:

    ERROR: syntax error at or near "FILTER"
    LINE 4:         count(*)::INT FILTER (WHERE valid)
            as valid_count,

And your first thought is that you're not using 9.4, or you got the filter clause wrong. But that's not the problem. The problem is that "aggregate() FILTER (where clause)" is a syntactical unit, and cannot be broken up by other expressions. Hence the syntax error. The correct expression is this one, with parens around the whole expression and then a cast to INT:

    SELECT
        device_id,
        count(*)::INT as present,
        (count(*) FILTER (WHERE valid))::INT as valid_count,
        (mode() WITHIN GROUP (order by val))::INT as mode,
        (percentile_disc(0.5) WITHIN GROUP (order by val))::INT
           as median
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

If you don't understand this, and you use calculated expressions, you can get a worse result: one which does not produce an error but is nevertheless wrong. For example, imagine that we were, for some dumb reason, calculating our own average over validated rows. We might do this:

    SELECT
        device_id,
        sum(val)/count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

... which would execute successfully, but would give us the total of all rows divided by the count of validated rows. That's because the FILTER clause applies only to the COUNT, and not to the SUM. If we actually wanted to calculate our own average, we'd have to do this:

    SELECT
        device_id,
        sum(val) FILTER (WHERE valid)
            / count(*) FILTER (WHERE valid) as avg
    FROM dataflow_0913
    GROUP BY device_id
    ORDER BY device_id;

Hopefully that helps everyone who is using the new aggregates to use them correctly and not get mysterious errors. In the meantime, we can see about making the error messages more helpful.

Friday, June 15, 2012

postgresql.conf: max_locks_per_transaction

PostgreSQL, and the postgresql.conf file, has around 150 parameters which you don't need to care about 98% of the time (as opposed to the 50-or-so settings you do need to care about more often), except that every once in a while you run into a situation which requires you to learn about some obscure parameter. That is, after all, why it's a changeable setting and not just hard-coded. max_locks_per_transaction is one setting.

The purpose of max_locks_per_transaction is to determine the size of the virtual locks "table" in memory. By default, it's set to 64, which means that Postgres is prepared to track up to (64 X number of open transactions) locks. For example, if you have it set at the default, and you currently have 10 concurrent sessions with transactions open, you can have up to 640 total locks held between all sessions. The reason to have a limit is to avoid using dedicated shared memory if you don't need more locks than that.

Most of the time for most users, But every once in a while, it's not:

2012-06-11 14:20:05.703 PDT,"processor","breakpad",17155,"[local]",4fd660cd.4303,2,"SELECT",2012-06-11 14:19:09 PDT,86/199551,0,ERROR,53200,"out of shared memory",,"You might need to increase max_locks_per_transaction.",,,,"select j.id, pj.uuid, 1, j.starteddatetime from jobs j right join priority_jobs_2849 pj on j.uuid = pj.uuid",,,""

The above helpful message is from the activity log. Unfortunately, the error which the client gets is just "out of shared memory", which is not that helpful ("what do you mean 'out of shared memory'? I have 4GB!").

The reason why the database above ran out of locks was that a few sessions were holding up to 1800 locks, most of them RowExclusiveLock. Given that a lock in Postgres is usually a lock on an object (like a table or part of a table) and not on a row, holding 1800 locks in one transaction is somewhat unusual. Why so many locks?

Well, the database in question has three tables each of which has over a hundred partitions. One frequent application activity was running an UPDATE against each of these partitioned tables with no partition condition in it, causing the UPDATE to check all partitions of each table. This resulted in RowExclusiveLocks on each partition and each index on each partition ... 1800 locks in total. Note that it needs this lock even though only one partition had rows which were actually updated; despite the name, it's a lock on the table or index, not on a specific row necessarily. This lock prevents certain things, like dropping the table or index while the locking transaction is still running.

So that's one time you probably want to increase max_locks_per_transaction out-of-the-box: if your database has several tables with many partitions, or if you use table inheritance a lot for other purposes. Especially since increasing it requires a restart.