Searching Google, I was surprised to find that there were few solutions published for a common issue: preventing users from creating a cycle when you create a self-join table. So here's one solution, which will be "good enough" for most people, but has some caveats (see below).
First, the setup: we have a table of items. Items can be in one or more collections. Each item can itself be a collection, allowing users to create collections of collections. So the first thing we need is a self-join table on the "adjacency list" model:
create table collections (
collection_id int not null references items(id) on delete cascade,
item_id int not null references items(id) on delete cascade,
constraint collections_pk primary key ( collection_id, item_id )
create index on collections(item_id);
So the first part of preventing cycles is to prevent the simplest cycle, where a collection collects itself. That can be done with a constraint:
alter table collections add constraint
no_self_join check ( collection_id <> item_id )
Now comes the tough part, preventing cycles of more than one, two, or N collections in a chain. This requires us to look down a chain of possible collections and make sure that each inserted tuple doesn't complete a loop. Fortunately, WITH RECURSIVE works for this provided we do it in a BEFORE trigger. If we did it in an AFTER trigger, the trigger itself would cycle, which would be no good.
CREATE OR REPLACE FUNCTION collections_prevent_cycle ()
-- select recusively, looking for all child items of the new collection
-- and making sure that they don't include the new collection
IF EXISTS ( WITH recursive colitem as (
select collection_id, item_id
where collection_id = NEW.item_id
select colitem.collection_id, collections.item_id
join colitem on colitem.item_id = collections.collection_id
SELECT collection_id from colitem
WHERE item_id = NEW.collection_id
LIMIT 1 ) THEN
RAISE EXCEPTION 'You may not create a cycle of collections.';
CREATE TRIGGER collections_prevent_cycle
BEFORE INSERT OR UPDATE ON collections
FOR EACH ROW EXECUTE PROCEDURE collections_prevent_cycle();
As I said, this solution will be "good enough" for a variety of uses. However, it has some defects:
Concurrency: It is vulnerable to concurrency failure. That is, if two users simultaneously insert "A collects B" and "B collects A", this trigger would not prevent it. The alternative is locking the entire table on each commit, which is also problematic.
Cost: we're running a pretty expensive recursive query with every insert. For applications where the tree table is write-heavy, this will decrease throughput significantly.
So my, challenge to you is this: come up with a better solution for this, which solves either the concurrency or cost problem without making the other problem worse.
P.S.: this blog has reached half a million views. Thanks, readers!