I need to calculate the depth of a descendant from it's ancestor. When a record has object_id = parent_id = ancestor_id
, it is considered a root node (the ancestor). I have been trying to get a WITH RECURSIVE
query running with PostgreSQL 9.4.
I do not control the data or the columns. The data and table schema comes from an external source. The table is growing continuously. Right now by about 30k records per day. Any node in the tree can be missing and they will be pulled from an external source at some point. They are usually pulled in created_at DESC
order but the data is pulled with asynchronous background jobs.
We initially had a code solution to this problem, but now having 5M+ rows, it takes almost 30 minutes to complete.
Example table definition and test data:
CREATE TABLE objects (
id serial NOT NULL PRIMARY KEY,
customer_id integer NOT NULL,
object_id integer NOT NULL,
parent_id integer,
ancestor_id integer,
generation integer NOT NULL DEFAULT 0
);
INSERT INTO objects(id, customer_id , object_id, parent_id, ancestor_id, generation)
VALUES (2, 1, 2, 1, 1, -1), --no parent yet
(3, 2, 3, 3, 3, -1), --root node
(4, 2, 4, 3, 3, -1), --depth 1
(5, 2, 5, 4, 3, -1), --depth 2
(6, 2, 6, 5, 3, -1), --depth 3
(7, 1, 7, 7, 7, -1), --root node
(8, 1, 8, 7, 7, -1), --depth 1
(9, 1, 9, 8, 7, -1); --depth 2
Note that object_id
is not unique, but the combination (customer_id, object_id)
is unique.
Running a query like this:
WITH RECURSIVE descendants(id, customer_id, object_id, parent_id, ancestor_id, depth) AS (
SELECT id, customer_id, object_id, parent_id, ancestor_id, 0
FROM objects
WHERE object_id = parent_id
UNION
SELECT o.id, o.customer_id, o.object_id, o.parent_id, o.ancestor_id, d.depth + 1
FROM objects o
INNER JOIN descendants d ON d.parent_id = o.object_id
WHERE
d.id <> o.id
AND
d.customer_id = o.customer_id
) SELECT * FROM descendants d;
I would like the generation
column to be set as the depth that was calculated. When a new record is added, the generation column is set as -1. There are some cases where a parent_id
may not have been pulled yet. If the parent_id
does not exist, it should leave the generation column set to -1.
The final data should look like:
id | customer_id | object_id | parent_id | ancestor_id | generation
2 1 2 1 1 -1
3 2 3 3 3 0
4 2 4 3 3 1
5 2 5 4 3 2
6 2 6 5 3 3
7 1 7 7 7 0
8 1 8 7 7 1
9 1 9 8 7 2
The result of the query should be to update the generation column to the correct depth.
I started working from the answers to this related question on SO.
update
the table with the result of your recursive CTE? – a_horse_with_no_name Jan 12 at 7:38ancestor_id
is already set, so you only need to assign the generation from the CTE.depth ? – joop Jan 12 at 16:22