Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have this simple query in pg

EXPLAIN ANALYZE 
select * from email_events 
where act_owner_id = 500
order by date desc
limit 500

The first query execution take very long time about 7 seconds.

"Limit  (cost=0.43..8792.83 rows=500 width=2311) (actual time=3.064..7282.497 rows=500 loops=1)"
"  ->  Index Scan Backward using email_events_idx_date on email_events  (cost=0.43..233667.36 rows=13288 width=2311) (actual time=3.059..7282.094 rows=500 loops=1)"
"        Filter: (act_owner_id = 500)"
"        Rows Removed by Filter: 1053020"
"Total runtime: 7282.818 ms"

After the first execution the query i guess is cached and goes in 20-30 ms.

Why the LIMIT is so slow when there is no cache? How can i fix this?

share|improve this question
    
The table has 2.5 mil rows –  Vasil Atanasov Feb 21 at 19:48
1  
do you have an index on (act_owner_id, date) ? –  DRapp Feb 21 at 19:53
    
Yes i do, even if is order by act_owner_id the result is the same –  Vasil Atanasov Feb 21 at 20:06
1  
Try a composite index on act_owner_id + date (in this order). –  kordirko Feb 21 at 20:12
    
How huge is your database aprox? How looks your load on the system? (Thinking of maybe a real huge database with small resources is maybe lagging on I/O on first attempt) –  frlan Feb 21 at 21:04

2 Answers 2

PostgreSQL thinks it will be faster to scan the date-ordered index backwards (i.e. in DESC order), reading every row and throwing away the rows that don't have the right act_owner_id. It's having to do 1053020 random reads to do this, and backward index scans aren't very fast either.

Try creating an index on email_events(date DESC, act_owner_id). I think Pg will be able to do a forward index scan on that and then use the second index term to filter rows, so it shouldn't have to do a heap lookup. Test with EXPLAIN and see.

share|improve this answer
    
Not helping... The DB was filled with dummy data generated from me, loaded table by table in bulk...I made CLUSTER on one smaller table, and the problem seems to be fixed for that table. After cleared cache the first time the query is executed in less than 500ms, before CLUSTER used to go up to 20 sec.. will do CLUSTER on the biggest table to see if that will fix the problem –  Vasil Atanasov Feb 22 at 14:20

CLUSTER TABLE on INDEX seems to fix the problem. It seems that after bulk data loading that data is all over the hard drive. CLUSTER table will re-order the data on the hard drive

share|improve this answer
    
Nice to see CLUSTERing on an index being useful for once. –  Craig Ringer Feb 24 at 8:08

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.