Wednesday, May 27, 2015

Determining your danger of multixact wraparound corruption

The PostgreSQL Project released an update on May 22, and the only really important fix in that update was a patch to prevent data corruption due to "Multixact member wraparound".  However,  it turns out that there are some issues with that update release, including one with file permissions, and potentially another one with Multixact truncation.  As a result, users are feeling justifiably reluctant to apply our last update release.

If there's a lesson for the project in this, it's "don't wrap an update release the same week as Feature Freeze".

Anyway, given this it would be good for users to determine more definitively if they are in actual danger of Multixact member wraparound (MMW hereafter), so that they know if they need to apply the update right away despite issues.  Here's how to do that:
  1. navigate to the PostgreSQL data directory on your server(s), e.g. "/var/lib/postgresql/9.3/main"
  2. switch to the subdirectory "pg_multixact/members" 
  3. count the number of files in this directory, e.g. "ls -l | wc -l"
If you have a few dozen, hundred, or even a few thousand multixact member files, you're fine.  If you have over 10,000, and that number increases rapidly, then you are potentially in danger of MMW, and should apply the update sooner rather than later. 

Of course, there are other fixes in this latest update, and if one of them specifically affects you, you may have updated already.

Thank you to Thomas Munro for pointing out the simple way to determine this and giving me background material on the fixes.

Note: the above is my personal advice, and has not been approved by the PostgreSQL project, core team, or PostgreSQL Experts Inc. The PostgreSQL project generally advises applying all updates promptly.

Monday, May 11, 2015

Recursive cycle detection is recursive

In prior posts, I've gone over some methods to prevent cycles from being added to your database.  However, imagine that someone has handed you an existing adjacency list tree -- perhaps migrated from another DBMS -- and you need to find all of the cycles as part of data cleaning?  How do you do that?

One way, obviously, would be just explore all paths and flag the ones where any ID appeared twice:

    WITH RECURSIVE prev AS (
        SELECT folders.id, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT folders.id, prev.depth + 1, path || folders.id as seen,
            folders.id = any(seen) as cycle
        FROM prev
        INNER JOIN folders on prev.id = parent_id
    )
    SELECT *
    FROM prev;



However, the above has a serious issue: the query itself will cycle and never complete (in fact, it will error out). So we need to terminate each cycle when the first repeat happens.  Fortunately, that's easy to do, and we'll filter for only the cycles while we're at it:

    WITH RECURSIVE prev AS (
        SELECT folders.id, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT folders.id, prev.depth + 1, seen || folders.id as seen,
            folders.id = any(seen) as cycle
        FROM prev
        INNER JOIN folders on prev.id = parent_id
        AND prev.cycle = false
    )
    SELECT *
    FROM prev
    WHERE cycle = true;


The results of the above query look like this:

    id | depth |       seen       | cycle
   ----+-------+------------------+-------
    21 |     2 | {21,21}          | t
    13 |     5 | {13,14,15,11,13} | t
    14 |     5 | {14,15,11,13,14} | t
    15 |     5 | {15,11,13,14,15} | t
    11 |     5 | {11,13,14,15,11} | t
    (5 rows)


One thing to notice is that you'll get a row for every node in a cycle loop.  That's because with cycles, the choice of starting point is arbitrary.  So the above query isn't the best choice for a deep tree where you have a lot of existing cycles.  There's a 2nd way to detect cycles in a tree; lets see if any of the commentors can post that query.



Thursday, May 7, 2015

Fancy SQL Thursday: row-to-column transformation

So, this question came up on IRC today:

"How can I take a one-week date range, and have each date in a separate column?"

This is a good question for an example of row-to-column transformation using Postgres' built-in functions.  While you could use the Tablefunc Extension to do crosstabs, you can also do this on your own in SQL.

First, let's generate a list of formatted dates using Postgres' built-in iterator, generate_series():

    with days as (
        select d, to_char(d, 'Mon DD') as label
        from generate_series($1,$2,interval '1 day') as gs(d)
    )

That generates days in the format "Apr 05" between the dates $1 and $2.  Now comes the tricky part, which is we're going to roll those dates up into an array so that we can transform them horizontally:

    dagg as (
        select array_agg(label order by d) as ld
        from days
    )


So we're using array_agg to make the day label into an array of labels.  We also add the "order by d" to the aggregate to make sure that those days stay in date order.

Once we've got that, then we can just select each array element as a column:

    select ld[1] as d1,
        ld[2] as d2,
        ld[3] as d3,
        ld[4] as d4,
        ld[5] as d5,
        ld[6] as d6,
        ld[7] as d7
    from dagg


Now, this has the limitation that we need to know how many days we're selecting before running the query, but pretty much any method we use requires that if we want columns as output.  So, putting it all together with some sample dates:

    with days as (
        select d, to_char(d, 'Mon DD') as label
        from generate_series('2015-04-01','2015-04-07',interval '1 day') as gs(d)
    ), dagg as (
        select array_agg(label order by d) as ld
        from days
    )
    select ld[1] as d1,
        ld[2] as d2,
        ld[3] as d3,
        ld[4] as d4,
        ld[5] as d5,
        ld[6] as d6,
        ld[7] as d7
    from dagg;


And the result:

      d1   |   d2   |   d3   |   d4   |   d5   |   d6   |   d7  
   --------+--------+--------+--------+--------+--------+--------
    Apr 01 | Apr 02 | Apr 03 | Apr 04 | Apr 05 | Apr 06 | Apr 07


That should help you figure out how to do row-to-column transformations for your own queries.  Enjoy!