2

I have several tables with partially overlapping TSTZRANGE values and I need to JOIN them such that "breakpoints" are created for each UPPER and LOWER boundary for each range.

Some sample data:

CREATE TABLE foo (
    foo_id SERIAL PRIMARY KEY,
    foo TEXT,
    active_timespan TSTZRANGE
);
INSERT INTO 
    foo (foo, active_timespan)
VALUES
    ('One',TSTZRANGE('2015-01-01 00:00:00-00','2015-01-01 06:00:00-00')),
    ('Two',TSTZRANGE('2015-01-01 12:00:00-00','2015-01-01 18:00:00-00'));

CREATE TABLE bar (
    bar_id SERIAL PRIMARY KEY,
    bar TEXT,
    active_timespan TSTZRANGE
);
INSERT INTO 
    bar (bar, active_timespan)
VALUES
    ('Alpha',TSTZRANGE('2015-01-01 03:00:00-00','2015-01-01 09:00:00-00')),
    ('Bravo',TSTZRANGE('2015-01-01 15:00:00-00','2015-01-01 21:00:00-00'));

Desired Result:

active_timespan                                       | foo  | bar
------------------------------------------------------|------|------
'["2015-01-01 00:00:00-00","2015-01-01 03:00:00-00"]' | One  | Null
'["2015-01-01 03:00:00-00","2015-01-01 06:00:00-00"]' | One  | Alpha
'["2015-01-01 06:00:00-00","2015-01-01 09:00:00-00"]' | Null | Alpha
'["2015-01-01 12:00:00-00","2015-01-01 15:00:00-00"]' | Two  | Null
'["2015-01-01 15:00:00-00","2015-01-01 18:00:00-00"]' | Two  | Bravo
'["2015-01-01 18:00:00-00","2015-01-01 21:00:00-00"]' | Null | Bravo

Current approach:

I currently use an elaborate system of cte's where I select the data I want from each table into it's own cte and then UNION all the the distinct UPPER and LOWER range values from each query, leaving me with a list of TIMESTAMPTZ values. I then use a LEAD window function to make new TSTZRANGE values. Once I have all that complete I join the original cte to the new list of TSTZRANGE values.

Update

Current query pattern below:

WITH
    cte_foo AS (
        SELECT
            foo_id,
            foo,
            active_timespan
        FROM
            foo
        WHERE
            active_timespan && $1
    )
    , cte_bar AS (
        SELECT
            bar_id,
            bar,
            active_timespan
        FROM
            bar
        WHERE
            active_timespan && $1
    )
--  continue for each table to be joined... 
--  , cte_baz AS (
--      SELECT
--          baz_id,
--          baz,
--          active_timespan
--      FROM
--          baz
--      WHERE
--          active_timespan && $1
--  )
    , cte_times AS (
        SELECT DISTINCT
            UNNEST(
                ARRAY[
                    LOWER(cte_foo.active_timespan),
                    UPPER(cte_foo.active_timespan)
                ]
            ) AS breakpoint
        FROM
            cte_foo

        UNION
        SELECT DISTINCT
            UNNEST(
                ARRAY[
                    LOWER(cte_bar.active_timespan),
                    UPPER(cte_bar.active_timespan)
                ]
            ) AS breakpoint
        FROM
            cte_bar

--      continue for each table to be joined... 
--      UNION
--      SELECT DISTINCT
--          UNNEST(
--              ARRAY[
--                  LOWER(cte_baz.active_timespan),
--                  UPPER(cte_baz.active_timespan)
--              ]
--          ) AS breakpoint
--      FROM
--          cte_baz
    )
    , cte_timespans AS (
        SELECT
            TSTZRANGE(
                sub_times.breakpoint,
                LEAD(sub_times.breakpoint, 1, 'infinity'::TIMESTAMPTZ) OVER (ORDER BY sub_times.breakpoint)
            ) AS timespan
        FROM
            (
                SELECT DISTINCT
                    cte_times.breakpoint
                FROM
                    cte_times
            ) AS sub_times
        ORDER BY
            timespan
    )
SELECT
    cte_timespans.timespan,
    cte_foo.foo,
    cte_bar.bar
FROM
    cte_timespans
    LEFT OUTER JOIN cte_foo ON cte_timespans.timespan && cte_foo.active_timespan
    LEFT OUTER JOIN cte_bar ON cte_timespans.timespan && cte_bar.active_timespan

--  continue for each table to be joined...
--  LEFT OUTER JOIN cte_baz ON cte_timespans.timespan && cte_baz.active_timespan
ORDER BY
    cte_timelines.timespan

Performance is obviously poor, especially when I am joining several tables (over ten in some cases). Is this the best way to approach this situation?

Wishful thinking:

It would be great if PostgreSQL had some kind of RANGE JOIN - something along the lines of:

SELECT
    *
FROM
    foo
    FULL OUTER RANGE JOIN bar ON foo.active_timespan && bar.active_timespan
10
  • 1
    "Performance is obviously poor," I don't see any proof that in your question. Please read: wiki.postgresql.org/wiki/Slow_Query_Questions and supply the execution plan (use explain analyze) on your real world data. The range data type can also be indexed: postgresql.org/docs/current/static/… Did you try that? Commented Sep 22, 2015 at 13:04
  • @a_horse_with_no_name - the performance takes a hit due to the nature of the cte (described as "optimization fences" in the documentation). They also remove all indices after the initial SELECT. Commented Sep 22, 2015 at 13:10
  • There is no CTE in your question. Plus: the full outer join will return all rows from both tables - that can't really be optimized. The database has to read all rows from foo and all rows from bar. Commented Sep 22, 2015 at 13:11
  • 2
    If you want help with a slow query,post the real query you are using. Commented Sep 22, 2015 at 13:17
  • 1
    I don't see any indexes in your question? Is that a shortcoming in your question or in your setup? I also don't see a query plan. Consider instructions here: stackoverflow.com/tags/postgresql-performance/info Commented Sep 22, 2015 at 14:18

1 Answer 1

0

assumptions

  • changing data structure is not an option
  • timespans have a proper index

optimization goals

  • less data in cte queries
  • simplify query
  • "lazy" load data

process steps

  • generate a list of (unique) breakpoints
  • compile a list of ranges between breakpoints
  • load data matching the ranges

resulting query

WITH
  eventlist AS (
    SELECT DISTINCT
      UNNEST(
          ARRAY [
          LOWER( active_timespan ),
          UPPER( active_timespan )
          ]
      ) AS timestamp
    FROM foo WHERE active_timespan && $1
    UNION DISTINCT
    SELECT
      UNNEST(
          ARRAY [
          LOWER( active_timespan ),
          UPPER( active_timespan )
          ]
      ) AS timestamp
    FROM bar WHERE bar.active_timespan && $2
  ),
  durations AS (
    SELECT
      TSTZRANGE(
          timestamp,
          LEAD( timestamp, 1, 'infinity' ) OVER (ORDER BY timestamp ASC)
      ) AS active_span
    FROM eventlist
  )
SELECT
  durations.active_span,
  foo.foo,
  bar.bar
FROM durations
  LEFT JOIN foo ON active_span && foo.active_timespan
  LEFT JOIN bar ON active_span && bar.active_timespan
WHERE (foo.foo IS NOT NULL OR bar.bar IS NOT NULL)
  AND foo.active_timespan && $1
  AND bar.active_timespan && $2
;
Sign up to request clarification or add additional context in comments.

2 Comments

This looks like it does the same thing as my current configuration except it hits the base tables twice (and the final table joins are written differently)... am I missing something?
the difference is, only the list timestamps are loaded into the memory first, not the whole tables --> less memory required. as I see now, the last two where statements are redundant. If there are proper indexes on all the ranges, this should work pretty fast.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.