I have several tables with partially overlapping TSTZRANGE
values and I need to JOIN
them such that "breakpoints" are created for each UPPER
and LOWER
boundary for each range.
Some sample data:
CREATE TABLE foo (
foo_id SERIAL PRIMARY KEY,
foo TEXT,
active_timespan TSTZRANGE
);
INSERT INTO
foo (foo, active_timespan)
VALUES
('One',TSTZRANGE('2015-01-01 00:00:00-00','2015-01-01 06:00:00-00')),
('Two',TSTZRANGE('2015-01-01 12:00:00-00','2015-01-01 18:00:00-00'));
CREATE TABLE bar (
bar_id SERIAL PRIMARY KEY,
bar TEXT,
active_timespan TSTZRANGE
);
INSERT INTO
bar (bar, active_timespan)
VALUES
('Alpha',TSTZRANGE('2015-01-01 03:00:00-00','2015-01-01 09:00:00-00')),
('Bravo',TSTZRANGE('2015-01-01 15:00:00-00','2015-01-01 21:00:00-00'));
Desired Result:
active_timespan | foo | bar
------------------------------------------------------|------|------
'["2015-01-01 00:00:00-00","2015-01-01 03:00:00-00"]' | One | Null
'["2015-01-01 03:00:00-00","2015-01-01 06:00:00-00"]' | One | Alpha
'["2015-01-01 06:00:00-00","2015-01-01 09:00:00-00"]' | Null | Alpha
'["2015-01-01 12:00:00-00","2015-01-01 15:00:00-00"]' | Two | Null
'["2015-01-01 15:00:00-00","2015-01-01 18:00:00-00"]' | Two | Bravo
'["2015-01-01 18:00:00-00","2015-01-01 21:00:00-00"]' | Null | Bravo
Current approach:
I currently use an elaborate system of cte
's where I select the data I want from each table into it's own cte
and then UNION
all the the distinct UPPER
and LOWER
range values from each query, leaving me with a list of TIMESTAMPTZ
values. I then use a LEAD
window function to make new TSTZRANGE
values. Once I have all that complete I join the original cte
to the new list of TSTZRANGE
values.
Update
Current query pattern below:
WITH
cte_foo AS (
SELECT
foo_id,
foo,
active_timespan
FROM
foo
WHERE
active_timespan && $1
)
, cte_bar AS (
SELECT
bar_id,
bar,
active_timespan
FROM
bar
WHERE
active_timespan && $1
)
-- continue for each table to be joined...
-- , cte_baz AS (
-- SELECT
-- baz_id,
-- baz,
-- active_timespan
-- FROM
-- baz
-- WHERE
-- active_timespan && $1
-- )
, cte_times AS (
SELECT DISTINCT
UNNEST(
ARRAY[
LOWER(cte_foo.active_timespan),
UPPER(cte_foo.active_timespan)
]
) AS breakpoint
FROM
cte_foo
UNION
SELECT DISTINCT
UNNEST(
ARRAY[
LOWER(cte_bar.active_timespan),
UPPER(cte_bar.active_timespan)
]
) AS breakpoint
FROM
cte_bar
-- continue for each table to be joined...
-- UNION
-- SELECT DISTINCT
-- UNNEST(
-- ARRAY[
-- LOWER(cte_baz.active_timespan),
-- UPPER(cte_baz.active_timespan)
-- ]
-- ) AS breakpoint
-- FROM
-- cte_baz
)
, cte_timespans AS (
SELECT
TSTZRANGE(
sub_times.breakpoint,
LEAD(sub_times.breakpoint, 1, 'infinity'::TIMESTAMPTZ) OVER (ORDER BY sub_times.breakpoint)
) AS timespan
FROM
(
SELECT DISTINCT
cte_times.breakpoint
FROM
cte_times
) AS sub_times
ORDER BY
timespan
)
SELECT
cte_timespans.timespan,
cte_foo.foo,
cte_bar.bar
FROM
cte_timespans
LEFT OUTER JOIN cte_foo ON cte_timespans.timespan && cte_foo.active_timespan
LEFT OUTER JOIN cte_bar ON cte_timespans.timespan && cte_bar.active_timespan
-- continue for each table to be joined...
-- LEFT OUTER JOIN cte_baz ON cte_timespans.timespan && cte_baz.active_timespan
ORDER BY
cte_timelines.timespan
Performance is obviously poor, especially when I am joining several tables (over ten in some cases). Is this the best way to approach this situation?
Wishful thinking:
It would be great if PostgreSQL had some kind of RANGE JOIN
- something along the lines of:
SELECT
*
FROM
foo
FULL OUTER RANGE JOIN bar ON foo.active_timespan && bar.active_timespan
explain analyze
) on your real world data. The range data type can also be indexed: postgresql.org/docs/current/static/… Did you try that?cte
(described as "optimization fences" in the documentation). They also remove all indices after the initialSELECT
.full outer join
will return all rows from both tables - that can't really be optimized. The database has to read all rows from foo and all rows from bar.