Optimize a complex Postgres SQL SELECT statement?

Question

I have the following somewhat complex SELECT statement with multiple joins, a group by and an order by:

SELECT 
   COUNT(*) AS count_all, 
   "response_variables"."id", 
   "response_variables"."var_name" AS "response_variables_id_response_variables_var_name" 
FROM "response_variables" 
   INNER JOIN "responses" ON "responses"."id" = "response_variables"."response_id" 
   INNER JOIN "questions" ON "questions"."id" = "responses"."question_id" 
WHERE "questions"."key" = 'rbmmpmvs' 
GROUP BY "response_variables"."id", "response_variables"."var_name" 
ORDER BY "response_variables"."var_name" ASC;

Here's the output of running EXPLAIN ANALYZE on that:

 GroupAggregate  (cost=720.80..723.20 rows=120 width=9) (actual time=277.127..285.953 rows=15265 loops=1)
   ->  Sort  (cost=720.80..721.10 rows=120 width=9) (actual time=277.120..281.391 rows=15265 loops=1)
         Sort Key: response_variables.var_name, response_variables.id
         Sort Method: external merge  Disk: 288kB
         ->  Nested Loop  (cost=0.00..716.66 rows=120 width=9) (actual time=0.064..21.795 rows=15265 loops=1)
               ->  Nested Loop  (cost=0.00..657.78 rows=128 width=4) (actual time=0.050..7.919 rows=3042 loops=1)
                     ->  Index Scan using index_questions_on_key on questions  (cost=0.00..8.27 rows=1 width=4) (actual time=0.032..0.033 rows=1 loops=1)
                           Index Cond: ((key)::text = 'rbmmpmvs'::text)
                     ->  Index Scan using index_responses_on_question_id on responses  (cost=0.00..646.69 rows=282 width=8) (actual time=0.016..7.326 rows=3042 loops=1)
                           Index Cond: (question_id = questions.id)
               ->  Index Scan using index_response_variables_on_response_id on response_variables  (cost=0.00..0.42 rows=4 width=13) (actual time=0.002..0.003 rows=5 loops=3042)
                     Index Cond: (response_id = responses.id)
 Total runtime: 288.766 ms
(13 rows)

I've got a number of indexes on various bits and pieces, but not sure where to start to optimize the call any more (or if it's possible at all).

What is your current setting for work_mem? Looks like you could use something extra: SET work_mem TO '100MB'; — Frank Heikens, Jan 10 '14 at 15:26

Bohemian · Answer 1 · 2014-01-10 15:42:55Z

The condition in your where clause applies to the inner-most joined table. That's bad, because all that joining must be done before finding out that the question row does, or does not, match.

Instead, list the question table first, inverting the table order:

SELECT 
  COUNT(*) AS count_all, 
  response_variables.id, 
  response_variables.var_name AS response_variables_id_response_variables_var_name
FROM questions
JOIN responses ON questions.id = responses.question_id
JOIN response_variables ON responses.id = response_variables.response_id
WHERE questions.key = 'rbmmpmvs' 
GROUP BY response_variables.id, response_variables.var_name
ORDER BY response_variables.var_name

As long as there's an index on question(key), and the id columns, this should perform well.

I also removed all those unnecessary double quotes that were causing code noise.

Reordering the tables will make no difference in Postgres, because the planner will dutifully try the joins in every order unless there's a (configurable) huge amount of them. — Denis de Bernardy, Jan 10 '14 at 19:28

KarthikRajagopal · Answer 2 · 2014-01-11 01:58:24Z

Try this:

SELECT 
  COUNT(*) AS count_all, 
  response_variables.id, 
  response_variables.var_name AS response_variables_id_response_variables_var_name
FROM questions
WHERE 1=1
AND EXISTS (Select 1 from responses where responses.id = questions.id)
AND EXISTS (Select 1 from response_variables where response_variables.id = questions.id)
AND questions.key = 'rbmmpmvs'
GROUP BY response_variables.id, response_variables.var_name
ORDER BY response_variables.var_name

Also, it your query is doing a disk based external merge, which can be very slow, and about 90% of the time is spent in the sort (259.596 ms).

As it says in the explain plan, about 288kb is of the data is written to disk, as it the data couldn't fit inside work_mem. Bumping up local work_mem for the transaction will force the planner to use in-memory quick sort, which should be a lot faster than external merge sorts.

asked	3 years, 3 months ago
viewed	225 times
active	3 years, 3 months ago

Optimize a complex Postgres SQL SELECT statement?

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged sql postgresql query-optimization or ask your own question.

Hot Network Questions

Optimize a complex Postgres SQL SELECT statement?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql postgresql query-optimization or ask your own question.

Related

Hot Network Questions