4

I'm looking to create a Postgres function that caches the results of a long-running query so it only needs to be executed once per transaction:

CREATE OR REPLACE FUNCTION get_records (
  _state VARCHAR(10)
)
RETURNS TABLE (
  id UUID
) AS
$$
DECLARE
  _temptable VARCHAR;
BEGIN
  _temptable := FORMAT('temp_state_%s', _state);

  IF NOT EXISTS(SELECT 1 FROM pg_tables WHERE tablename = _temptable) THEN
    EXECUTE FORMAT('CREATE TEMPORARY TABLE %I (id UUID NOT NULL, PRIMARY KEY(_uid)) ON COMMIT DROP', _temptable);
    EXECUTE FORMAT('INSERT INTO %I SELECT id FROM very_complex_nested_query', _temptable);
    EXECUTE FORMAT('ANALYZE %I', _temptable);
  END IF;

  RETURN QUERY EXECUTE FORMAT('SELECT id FROM %I', _temptable);
END;
$$
LANGUAGE 'plpgsql';

Now, I can run all of my queries and join this function:

SELECT mt.*
FROM my_table1 mt
  INNER JOIN get_records('new') nr ON mt.id = nr.id
SELECT mt.*
FROM my_table2 mt
  INNER JOIN get_records('new') nr ON mt.id = nr.id
SELECT mt.*
FROM my_table3 mt
  INNER JOIN get_records('new') nr ON mt.id = nr.id
-- ... many more

I've got a whole pile of these, without guarantees on which one will run first or in what order.

This works pretty well, except that the primary key index on the temporary table isn't used.

How can I return a "table" from a Postgres function, instead of just the results of a query?

  1. I'm using a function to build the temporary table instead of a materialized view to get around the "where clause doesn't get pushed into view that contains aggregation" issue.
  2. I could create the temporary table, then refer to it directly in all the queries, but it would mean having to build in some sort of blocking mechanism to ensure the queries aren't executed too early, and the tool I'm using doesn't support such mechanisms very well.
5
  • 1
    Do not create a temp table, do not use a temp table. Do not analyze the temp table. Put the "very_complex_nested_query" into a plain sql function. A query using that a sql function can be optimized far better then the blackbox that a PL/pgSQL presents to the optimizer.
    – user330315
    Dec 1, 2016 at 7:30
  • @a_horse_with_no_name But... the "very_complex_nested_query" takes 5 minutes to execute, and since I'm joining the results to hundreds of separate queries, I'd be adding days to the total execution time. Isn't this the exact use-case of a temp table? What do you mean by "plain sql function"? Dec 1, 2016 at 16:26
  • I mean language sq instead of language plpgsql - but if you want to index it, you need to create an index on the temp table.
    – user330315
    Dec 1, 2016 at 16:28
  • @a_horse_with_no_name How do you create a dynamically-named temp table and join to it within the same query using plain sql? I can't guarantee order of queries, so the temp table could potentially be created within any query, and I don't want to have to create it multiple times. I'm looking to implement an "on cache miss: do work, cache, and return" sort of design. Dec 1, 2016 at 16:41
  • The SQL function will only contain the SELECT statement, no temp table. You still use it the same way but Postgres is able to optimize the overall query including the one inside the function. Something like this: dpaste.com/1J5REFJ
    – user330315
    Dec 1, 2016 at 16:50

1 Answer 1

3

You can try the modifier STABLE which indicates that

STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements.

The newer Postgres versions support materialized views as well. You can create a materialized view for the join. AFAIK materialized views supports indices as well.

1
  • By returning a table that's used in the FROM clause, wouldn't it be scanned only once anyways? In either case, neither STABLE nor IMMUTABLE seem to expose the underlying index of the temp table. As for materialized views, the issue there is that you can't benefit from check constraints (such as in a partitioned table) if you're doing aggregation, which results in all partitions being scanned (and my 5-minute query becomes an 18-hour query) - this was quite a surprise to me when I saw a ten-thousand line query plan! Dec 1, 2016 at 16:34

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.