0

we are having performance issue regarding select data in a Postgres database.

I have two tables, fleet_test and fleet_inspection_test.
fleet_test contains 94 columns as key:(fleet_id, usdot, country_code, contacts_count,....) and fleet_inspection_test contains 9 columns as key: (inspection_id, dot_number, insp_date... ).
fleet_test has around 460k rows and fleet_inspection_test has around 6 million rows.

The query I need to run as following and it takes around 4 seconds to run in the database.

select sum(contacts_count) as sum from fleet_test
where country_code = 'US'
and usdot in (select  dot_number from fleet_inspection_test
            where insp_date between '2016-02-15' and '2016-12-15');

If I only run the main query and it takes around 600 mesc:

select sum(contacts_count) as sum from fleet_test
where country_code = 'US'

We have indexes for each column. Unfortunately the results are not that good:

CREATE INDEX ix_fleet_test_contacts_count ON fleet_test USING btree (contacts_count);
CREATE INDEX ix_fleet_test_country_code ON fleet_test USING btree (country_code);
CREATE INDEX ix_fleet_test_usdot ON fleet_test USING btree (usdot);
CREATE INDEX ix_fleet_inspection_test_dot_number ON fleet_inspection_test USING btree (dot_number);
CREATE INDEX ix_fleet_inspection_test_insp_date ON fleet_inspection_test USING btree (insp_date);

QUERY PLAN

'Aggregate  (cost=198451.70..198451.71 rows=1 width=4) (actual time=12300.010..12300.011 rows=1 loops=1)'
'  Buffers: shared hit=89350'
'  ->  Hash Join  (cost=152330.84..198276.23 rows=70189 width=4) (actual time=10902.309..12168.892 rows=189740 loops=1)'
'        Hash Cond: (fleet_test.usdot = fleet_inspection_test.dot_number)'
'        Buffers: shared hit=89350'
'        ->  Seq Scan on fleet_test  (cost=0.00..43722.72 rows=444397 width=8) (actual time=0.015..562.385 rows=443616 loops=1)'
'              Filter: ((country_code)::text = 'US'::text)'
'              Rows Removed by Filter: 14726'
'              Buffers: shared hit=37993'
'        ->  Hash  (cost=151817.53..151817.53 rows=41065 width=4) (actual time=10898.319..10898.319 rows=356230 loops=1)'
'              Buckets: 524288 (originally 65536)  Batches: 1 (originally 1)  Memory Usage: 16620kB'
'              Buffers: shared hit=51357'
'              ->  HashAggregate  (cost=151406.88..151817.53 rows=41065 width=4) (actual time=10240.109..10604.871 rows=356231 loops=1)'
'                    Group Key: fleet_inspection_test.dot_number'
'                    Buffers: shared hit=51357'
'                    ->  Seq Scan on fleet_inspection_test  (cost=0.00..138905.67 rows=5000485 width=4) (actual time=126.081..5077.200 rows=5007065 loops=1)'
'                          Filter: ((insp_date >= '2016-02-15'::date) AND (insp_date <= '2016-12-15'::date))'
'                          Rows Removed by Filter: 829513'
'                          Buffers: shared hit=51357'
'Planning time: 1.092 ms'
'Execution time: 12302.407 ms'

This is an EXPLAIN (ANALYZE, BUFFERS) example from my query. My expectations are to get this execution time in less than a second.

These are some parameters of the postgres db config:
Operating system: CentOS
System memory: 16GB

work_mem = 768MB
shared_buffers = 2048MB

Any help, comment or thought would be really appreciated!

Thank you in advance.

5
  • Whats the result of "select distinct dot_number, count(dot_number) from fleet_inspection_test group by dot_number order by count(dot_number) desc" (does this have enough different values for an index to work?)
    – jdog
    Jan 3 2017 at 22:41
  • Did it help in any way, if so how?
    – jdog
    Jan 4 2017 at 0:15
  • @jdog Thank you! It still takes Seq scan on fleet_inspection_test table. Query plan: ' -> Seq Scan on fleet_inspection_test (cost=0.00..109722.78 rows=5836578 width=4) (actual time=0.016..4015.458 rows=5836578 loops=1)'
    – Jun
    Jan 4 2017 at 0:17
  • what is the result of that query that I requested?
    – jdog
    Jan 4 2017 at 0:31
  • @jdog It takes 9 seconds to run and 381283 rows returned. Thank you!
    – Jun
    Jan 4 2017 at 0:35
0

Try this (3rd version)

    SELECT SUM(contacts_count) AS sum 
      FROM fleet_test AS t
      JOIN (SELECT DISTINCT dot_number 
              FROM fleet_inspection_test
             WHERE insp_date BETWEEN '2016-02-15' AND '2016-12-15') AS d
        ON (t.usdot = d.dot_number)
     WHERE t.country_code = 'US';
5
  • Thank you! I have tried the query you shared to me. It takes more time(7 secs ) than my original query.
    – Jun
    Jan 4 2017 at 0:21
  • Uh-oh... please try the amended version I’ve put in the answer... in theory, it should be the same as my previous version... but if it doesn’t take too long, please VACUUM ANALYZE before running each test query, so that they run on an even field...
    – Dario
    Jan 4 2017 at 0:44
  • I have VACUUMed the table and have run the amended query. It takes the same amount of time. Thanks!
    – Jun
    Jan 4 2017 at 1:41
  • OK, thank you, second try as expected. So the problem stands, why it is not using indexes but seq scans only...
    – Dario
    Jan 4 2017 at 2:18
  • I think If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan? I have tried to force database to index scan. It doesn't help me. Thank you!
    – Jun
    Jan 4 2017 at 3:50

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.