Thursday, May 7, 2015

Fancy SQL Thursday: row-to-column transformation

So, this question came up on IRC today:

"How can I take a one-week date range, and have each date in a separate column?"

This is a good question for an example of row-to-column transformation using Postgres' built-in functions.  While you could use the Tablefunc Extension to do crosstabs, you can also do this on your own in SQL.

First, let's generate a list of formatted dates using Postgres' built-in iterator, generate_series():

    with days as (
        select d, to_char(d, 'Mon DD') as label
        from generate_series($1,$2,interval '1 day') as gs(d)
    )

That generates days in the format "Apr 05" between the dates $1 and $2.  Now comes the tricky part, which is we're going to roll those dates up into an array so that we can transform them horizontally:

    dagg as (
        select array_agg(label order by d) as ld
        from days
    )


So we're using array_agg to make the day label into an array of labels.  We also add the "order by d" to the aggregate to make sure that those days stay in date order.

Once we've got that, then we can just select each array element as a column:

    select ld[1] as d1,
        ld[2] as d2,
        ld[3] as d3,
        ld[4] as d4,
        ld[5] as d5,
        ld[6] as d6,
        ld[7] as d7
    from dagg


Now, this has the limitation that we need to know how many days we're selecting before running the query, but pretty much any method we use requires that if we want columns as output.  So, putting it all together with some sample dates:

    with days as (
        select d, to_char(d, 'Mon DD') as label
        from generate_series('2015-04-01','2015-04-07',interval '1 day') as gs(d)
    ), dagg as (
        select array_agg(label order by d) as ld
        from days
    )
    select ld[1] as d1,
        ld[2] as d2,
        ld[3] as d3,
        ld[4] as d4,
        ld[5] as d5,
        ld[6] as d6,
        ld[7] as d7
    from dagg;


And the result:

      d1   |   d2   |   d3   |   d4   |   d5   |   d6   |   d7  
   --------+--------+--------+--------+--------+--------+--------
    Apr 01 | Apr 02 | Apr 03 | Apr 04 | Apr 05 | Apr 06 | Apr 07


That should help you figure out how to do row-to-column transformations for your own queries.  Enjoy!


5 comments:

  1. Have you ever tried the crosstab feature of the tablefunc extension?

    http://www.postgresql.org/docs/9.4/interactive/tablefunc.html

    ReplyDelete
    Replies
    1. It's mentioned in the blog post above.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. was there did that, its ok for small datasets. but when you trying to work with large datasets, its way too slow. with more columns, query gets complicated and unreadable.

    crosstab is hundreds times faster,

    yet, for complicated queries i have written a handful functions in plpgsql which do generate dynamically query or function depending on various criteria generating columns and results required

    ReplyDelete
    Replies
    1. I think that's missing the point, and the post is just intended to be an example of doing the transformation in sql when it suits best.

      A few years back I wanted to output some figures in a similar manner, what I did could be fudged back into a slightly more complex but messier example:

      with days as (
      select row_number() over (order by d), d, to_char(d, 'Mon DD') as label,
      extract(week from d)-first_value(extract(week from d)) over (order by d) as week,
      first_value(case extract(dow from d) when 0 then 6 else extract(dow from d)-1 end) over (partition by extract(week from d), extract(year from d) order by d) as label_start
      from generate_series('2015-04-01','2015-05-07',interval '1 day') as gs(d) order by d
      ), dagg as (
      select extract(year from d) as year, array_agg(label order by d) as ld, week as lw, label_start as ls
      from days group by week, label_start, year order by year, min(row_number)
      )
      select year || ' w' || lw AS week, ld[1-ls] as mon,
      ld[2-ls] as tue,
      ld[3-ls] as wed,
      ld[4-ls] as thu,
      ld[5-ls] as fri,
      ld[6-ls] as sat,
      ld[7-ls] as sun
      from dagg;

      Delete