Thursday, July 23, 2015

unsafe UPSERT using WITH clauses

By now you've read about PostgreSQL 9.5 and our shiny new UPSERT feature.  And you're thinking, "hey, 9.5 is still in alpha, how do I get me some UPSERT action now?"  While there are some suggestions for workarounds on the PostgreSQL wiki, I'm going to show you one more method for approximating UPSERT while you help us test 9.5 so that it will come out sooner.

The general suggested way for handling UPSERT situations is one of the following:
  1. Try to do an INSERT.  On error, have the application retry with an UPDATE.
  2. Write a PL/pgSQL procedure which does insert-then-update or update-then-insert.
Both of these approaches have the drawback of being very high overhead: the first involves multiple round-trips to the database and lots of errors in the log, and the second involves major subtransaction overhead.  Neither is concurrency-safe, but then the method I'm about to show you isn't either.  At least this method avoids a lot of the overhead, though.

What's the method?  Using writeable WITH clauses.  This feature, introduced in 9.1, allows you to do a multi-step write transaction as a single query.  For an example, let's construct a dummy table with a unique key on ID and a value column, then populate it:

     create table test_upsert ( id int not null primary key, 
        val text );
     insert into test_upsert select i, 'aaa'
       from generate_series (1, 100) as gs(i);


Now, let's say we wanted to update ID 50, or insert it if it doesn't exist.  We can do that like so:

    WITH
    newrow ( id, val ) as (
        VALUES ( 50::INT, 'bbb'::TEXT ) ),
    tryupdate as (
        UPDATE test_upsert SET val = newrow.val
        FROM newrow
        WHERE test_upsert.id = newrow.id
        RETURNING test_upsert.id
    )
    INSERT INTO test_upsert
    SELECT id, val
        FROM newrow
    WHERE id NOT IN ( SELECT id FROM tryupdate );


The above tries to update  ID=50.  If no rows are updated, it inserts them.  This also works for multiple rows:

    WITH
    newrow ( id, val ) as (
        VALUES ( 75::INT, 'ccc'::TEXT ),

                      ( 103::INT, 'ccc'::TEXT )
    ),
    tryupdate as (
        UPDATE test_upsert SET val = newrow.val
        FROM newrow
        WHERE test_upsert.id = newrow.id
        RETURNING test_upsert.id
    )
    INSERT INTO test_upsert
    SELECT id, val
        FROM newrow
    WHERE id NOT IN ( SELECT id FROM tryupdate );


... and will update or insert each row as called for.

Given that we can do the above, why do we need real UPSERT?  Well, there's some problems with this approximate method:
  • It's not concurrency-safe, and can produce unexpected results given really bad timing of multiple connections wanting to update the same rows.
  • It will still produce key violation errors given bad concurrency timing, just fewer of them than method 1 above.
  • It's still higher overhead than 9.5's UPSERT feature, which is optimized.
  • It will return INSERT 0 0  from calls that do only updates, possibly making the app think the upsert failed.
  • It's not safe to use with INSERT/UPDATE triggers on the same table.
But ... if you can't wait for 9.5, then at least it's a temporary workaround for you.  In the meantime, download 9.5 alpha and get testing the new UPSERT.

No comments:

Post a Comment