Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.
SELECT * 
FROM vehicles t1 
WHERE (SELECT COUNT(*) FROM vehicles t2
       WHERE  t1.pump_number = t2.pump_number
          AND t1.updated_at < t2.updated_at
      ) < 4
      AND t1.updated_at >= ?

And I supply '1970-01-01 00:00:00.000000' for the parameter ?.

I have around 10k records in the vehicles table and no index is added. Above query takes around 10-20 seconds in execution.

How I can optimize it to decrease execution time?

share|improve this question
    
I just corrected title –  umair Dec 16 '14 at 11:18
    
Do you want the predicate t1.updated_at >= ? to apply to the count in the subquery as well? Because it doesn't as it is. –  Erwin Brandstetter Dec 16 '14 at 12:12

4 Answers 4

Postgres provide nice admin tool which has option EXPLAIN to see query execution plan . It will give great insights . here is the link for pgadmin in detail http://www.pgadmin.org/docs/1.4/query.html

Also use joins in your query instead of select that will increase your query performance

share|improve this answer

Try this( the columns in select and group by statement need to be replaced by your own):

SELECT
    t1.id,
    t1.updated_at,
    t1.other_columns
FROM vehicles t1 
INNER JOIN vehicles t2
ON  t1.pump_number = t2.pump_number
    AND t1.updated_at < t2.updated_at
WHERE t1.updated_at >= '1970-01-01 00:00:00.000000'
GROUP BY 
    t1.id,
    t1.updated_at,
    t1.other_columns
having count(*)< 4

After this change, you could try to add a index on column pump_number to see if it helps.

share|improve this answer

This is your query:

SELECT *
FROM vehicles t1
WHERE  (SELECT Count(*)
        FROM vehicles t2
        WHERE t1.pump_number = t2.pump_number AND
              t1.updated_at < t2.updated_at
       ) < 4  AND
       t1.updated_at >= ? " , "1970-01-01 00:00:00.000000")]

I would start by writing this using window functions:

select v.*
from (select v.*, row_number() over (partition by pump_number order by updated_at) as seqnum
      from vehicles v
     ) v
where v.seqnum < 4 and t1.updated_at >= ?;

For this query, I would suggest indexes on vehicles(pump_number, updted_at) and vehicles(updated_at).

share|improve this answer

To get an equivalent query, use the window function rank(), not row_number() here:

SELECT *
FROM  (
   SELECT *
        , rank() OVER (PARTITION BY pump_number ORDER BY updated_at DESC) AS rnk
   FROM   vehicles t1 
   ) sub
WHERE rnk < 4
AND   updated_at >= '1970-01-01 0:0';

And it has to be ORDER BY updated_at DESC, to exclude rows that have more than three older peers for the same pump_number. In other words:
"Get the three oldest rows per pump_number - or more if there are ties on updated_at".

Indexes are not going to help while you read most or all of the table anyway.

Further optimize performance

If (pump_number, updated_at) is unique or / and there are relatively few distinct values for pump_number in vehicles, you can probably optimize further. There is not enough information in your question.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.