You already know that I'm a big fan of custom aggregates. What's even more fun is when you combine them with the ordered aggregate feature from PostgreSQL 9.0. For example, let's take a fairly common task: selecting the item with the maximum importance.
This is a challenge for standard aggregation because you want to display a value which relates to an importance score contained in another table. For example, the table could be a lookup list of error types, and you want to display the most severe error. This is simple to do if you only want one row, but what if you want to group by other columns? Well, here's a relatively neat way using ordered aggregates and custom aggregates.
First, we want to create a generic aggregate called "first", which simply returns the first non-null value in the column:
CREATE FUNCTION first_state (
anyelement, anyelement )
CREATE AGGREGATE first ( anyelement ) (
SFUNC = first_state,
STYPE = anyelement
This is a generally useful aggregate to have around, btw, so I suggest adding it to your general library.
Now, let's look at our tables:
table errors (
error_code text primary key,
severity int not null default 0,
table error_log (
So, say I wanted to see the count of errors for each hour in the log and the most severe error during that hour? Well, what I need to do is to select the first error code which shows up when I order by severity. For good measure, I order by error_code as well, in case there's two codes with the same severity:
SELECT date_trunc('hour',log_time) as log_hour,
count(*) as error_count,
first(error_code order by severity desc, error_code)
JOIN errors USING (error_code)
GROUP BY date_trunc('hour',log_time)
ORDER BY date_trunc('hour',log_time);
And that works a treat:
log_hour | error_count | most_severe
2013-06-17 00:00:00+00 | 6 | CX56B
2013-06-17 01:00:00+00 | 16 | CX56B
2013-06-17 02:00:00+00 | 4 | PPPMQ5
2013-06-17 03:00:00+00 | 2 | PPPMQ5
2013-06-17 04:00:00+00 | 24 | CX56B
2013-06-17 05:00:00+00 | 15 | CX56B
2013-06-17 06:00:00+00 | 5 | FM229A
I'm sure you can figure out ways to adapt this to your needs. Enjoy!
We really should have first() (and possibly last()) bundled in core PostgreSQL.ReplyDelete
count_true() is also a surprisingly handy general custom aggregate.Delete
What's the difference between first() above and first_value() already in PG core?ReplyDelete
first_value() """returns value evaluated at the row that is the first row of the window frame"""
(sorry if this is posted multiple times - no feedback at all on whether it worked or not)
first_value() requires a window declaration; it doesn't work as an aggregate.Delete
Technically, we should support first_value(x order by y), which is effectively the same thing. But we don't.