One way to get this kind of mode is to do a windowing function, but as mentioned this works poorly with other aggregates in the same result set. So let's take the same custom aggregate approach and see if we can do better.

Now, for types other than boolean, mode() is a "two-pass aggregate". That means it's impossible to calculate in one pass; you need two passes, one to sort the set, and one to count the items and pick the MCV. Since we know we'll need two passes, we'll construct our aggregate to assume that it's receiving sorted data going in, and make sure it gets sorted data when we use it. Given 9.2's new ORDER BY clause for aggregates, that's easy to ensure.

For a state type, we'll need a 4-part register. This register will include:

- the last seen value
- the count of the last seen value
- the most common value so far
- the count of the most common value so far

create or replace function sorted_mode_state(

tally bigint[], nextval bigint

)

returns bigint[]

language sql

immutable

as $f$

-- have we seen this value before?

SELECT CASE WHEN nextval = tally[1] THEN

-- if weve seen it before, does it out-count the prior MCV?

CASE WHEN tally[2] = tally[4] THEN

-- if so, swap the MCV

ARRAY[ nextval, tally[2] + 1, nextval, tally[2] + 1

ELSE

-- if not, keep counting

ARRAY[ nextval, tally[2] + 1, tally[3], tally[4] ]

END

-- is the register uninitialized? then take the first value

WHEN tally[1] IS NULL THEN

ARRAY [ nextval, 1, nextval, 1 ]

-- skip nulls

WHEN nextval IS NULL THEN

tally

ELSE

-- if it's a new value, count 1

ARRAY [ nextval, 1, tally[3], tally[4] ]

END;

$f$;

The final function just picks out tally[3], returning NULL if no MCV has been selected:

create function sorted_mode_final(

tally bigint[]

)

returns bigint

language sql

immutable

as $f$

SELECT CASE

-- do we have an MCV?

WHEN tally[4] > 1 THEN tally[3]

-- otherwise, do we have a single-value set?

WHEN tally[1] = tally[3] THEN tally[3]

-- is there no clear mcv? return null

ELSE NULL END;

$f$;

Then we can build our aggregate:

CREATE AGGREGATE sorted_mode ( bigint ) (

SFUNC = sorted_mode_state,

STYPE = BIGINT[],

FINALFUNC = sorted_mode_final,

INITCOND = '{}'

);

Usage:

SELECT server, date_trunc('hour', poll_time) as thehour,

sorted_mode(status ORDER BY status) as status_mode

FROM server_status

GROUP BY server, date_trunc('hour', poll_time)

ORDER BY date_trunc('hour', poll_time), server;

This approach has some inherent limitations, of course:

- if no value appears more than once, it returns NULL, unless there's only one value in the set;
- if several values appear the same number of times, it returns the first one in the sort order;
- it can't do estimated mode or other mode constructions.

## No comments:

## Post a Comment