Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have 2 tables in PostgreSQL 9.1 - flight_2012_09_12 containing approx 500,000 rows and position_2012_09_12 containing about 5.5 million rows. I'm running a simple join query and it's taking a long time to complete and despite the fact the tables aren't small I'm convinced there are some major gains to be made in the execution.

The query is:

SELECT f.departure, f.arrival, 
       p.callsign, p.flightkey, p.time, p.lat, p.lon, p.altitude_ft, p.speed 
FROM position_2012_09_12 AS p 
JOIN flight_2012_09_12 AS f 
     ON p.flightkey = f.flightkey 
WHERE p.lon < 0 
      AND p.time BETWEEN '2012-9-12 0:0:0' AND '2012-9-12 23:0:0'

The output of explain analyze is:

Hash Join  (cost=239891.03..470396.82 rows=4790498 width=51) (actual time=29203.830..45777.193 rows=4403717 loops=1)
Hash Cond: (f.flightkey = p.flightkey)
->  Seq Scan on flight_2012_09_12 f  (cost=0.00..1934.31 rows=70631 width=12) (actual time=0.014..220.494 rows=70631 loops=1)
->  Hash  (cost=158415.97..158415.97 rows=3916885 width=43) (actual time=29201.012..29201.012 rows=3950815 loops=1)
     Buckets: 2048  Batches: 512 (originally 256)  Memory Usage: 1025kB
     ->  Seq Scan on position_2012_09_12 p  (cost=0.00..158415.97 rows=3916885 width=43) (actual time=0.006..14630.058 rows=3950815 loops=1)
           Filter: ((lon < 0::double precision) AND ("time" >= '2012-09-12 00:00:00'::timestamp without time zone) AND ("time" <= '2012-09-12 23:00:00'::timestamp without time zone))
Total runtime: 58522.767 ms

I think the problem lies with the sequential scan on the position table but I can't figure out why it's there. The table structures with indexes are below:

               Table "public.flight_2012_09_12"
   Column       |            Type             | Modifiers 
--------------------+-----------------------------+-----------
callsign           | character varying(8)        | 
flightkey          | integer                     | 
source             | character varying(16)       | 
departure          | character varying(4)        | 
arrival            | character varying(4)        | 
original_etd       | timestamp without time zone | 
original_eta       | timestamp without time zone | 
enroute            | boolean                     | 
etd                | timestamp without time zone | 
eta                | timestamp without time zone | 
equipment          | character varying(6)        | 
diverted           | timestamp without time zone | 
time               | timestamp without time zone | 
lat                | double precision            | 
lon                | double precision            | 
altitude           | character varying(7)        | 
altitude_ft        | integer                     | 
speed              | character varying(4)        | 
asdi_acid          | character varying(4)        | 
enroute_eta        | timestamp without time zone | 
enroute_eta_source | character varying(1)        | 
Indexes:
"flight_2012_09_12_flightkey_idx" btree (flightkey)
"idx_2012_09_12_altitude_ft" btree (altitude_ft)
"idx_2012_09_12_arrival" btree (arrival)
"idx_2012_09_12_callsign" btree (callsign)
"idx_2012_09_12_departure" btree (departure)
"idx_2012_09_12_diverted" btree (diverted)
"idx_2012_09_12_enroute_eta" btree (enroute_eta)
"idx_2012_09_12_equipment" btree (equipment)
"idx_2012_09_12_etd" btree (etd)
"idx_2012_09_12_lat" btree (lat)
"idx_2012_09_12_lon" btree (lon)
"idx_2012_09_12_original_eta" btree (original_eta)
"idx_2012_09_12_original_etd" btree (original_etd)
"idx_2012_09_12_speed" btree (speed)
"idx_2012_09_12_time" btree ("time")

          Table "public.position_2012_09_12"
Column    |            Type             | Modifiers 
-------------+-----------------------------+-----------
 callsign    | character varying(8)        | 
 flightkey   | integer                     | 
 time        | timestamp without time zone | 
 lat         | double precision            | 
 lon         | double precision            | 
 altitude    | character varying(7)        | 
 altitude_ft | integer                     | 
 course      | integer                     | 
 speed       | character varying(4)        | 
 trackerkey  | integer                     | 
 the_geom    | geometry                    | 
Indexes:
"index_2012_09_12_altitude_ft" btree (altitude_ft)
"index_2012_09_12_callsign" btree (callsign)
"index_2012_09_12_course" btree (course)
"index_2012_09_12_flightkey" btree (flightkey)
"index_2012_09_12_speed" btree (speed)
"index_2012_09_12_time" btree ("time")
"position_2012_09_12_flightkey_idx" btree (flightkey)
"test_index" btree (lon)
"test_index_lat" btree (lat)

I can't think of any other way to rewrite the query and so I'm stumped at this point. If the current setup is as good as it gets so be it but it seems to me that it should be much faster than it currently is. Any help would be much appreciated.

share|improve this question
Can you provide statistics on public.position_2012_09_12 table lon and time columns? Maybe some (time) where lon <0 index will help but there is 3950815 rows in position table which match this conditions. Is there much more data in this table? – sufleR Dec 4 '12 at 22:18
2  
Which version of Postgresql are you using? – plang Dec 4 '12 at 22:27
2  
And have you analyzed your tables before doing your query? – plang Dec 4 '12 at 22:33
2  
The estimated and actual row counts look like a good match, so I doubt that the tables need analyzing. The 512 batches on the hash join looks large and the 1024kb memory usage looks small -- I wonder if it would do better with a larger work_mem. That aside, the plan looks good to me and the performance may only improve with hardware improvements. – David Aldridge Dec 4 '12 at 22:51
1  
That BETWEEN probably isn't doing what you want - it's going to include the first second/millisecond/nanosecond (what granularity does postgreSQL run at) of 11:00pm; always (almost) use exclusive upper-bounds (<), especially with timestamps. Also, why only 11 (and not until just before the next day)? And 'time' is a non-descriptive name for a column - maybe use something like recordedAt? – Clockwork-Muse Dec 4 '12 at 23:41
show 6 more comments

2 Answers

up vote 2 down vote accepted

The reason you are getting a sequential scan is that Postgres believes that it will read less disk pages that way than using indexes. It is probably right. Consider, if you use a non-covering index, you need to read all the matching index pages. it essentially outputs a list of row identifiers. The DB engine then needs to read each of the matching data pages.

Your position table uses 71 bytes per row, plus whatever a geom type takes (I'll assume 16 bytes for illustration), making 87 bytes. A Postgres page is 8192 bytes. So you have approximately 90 rows per pages.

Your query matches 3950815 out of 5563070 rows, or about 70% of the total. Assuming the data is randomly distributed, with regard to your where filters, there is a pretty much a 30% ^ 90 chance of finding a data page with no matching row. This is essentially nothing. So regardless of how good your indexes are, you're still going to have to read all the data pages. If you're going to have to read all the pages anyway, a table scan is usually a good approach.

The one get out here, is that I said non-covering index. If you are prepared to create indexes that can answer queries in of themselves, you can avoid looking up the data pages at all, so you are back in the game. I'd suggest the following are worth looking at:

flight_2012_09_12 (flightkey, departure, arrival)
position_2012_09_12 (filghtkey, time, lon, ...)
position_2012_09_12 (lon, time, flightkey, ...)
position_2012_09_12 (time, long, flightkey, ...)

The dots here represent the rest of the columns you are selecting. You'll only need one of the indexes on position, but it's hard to tell which will prove the best. The first approach may permit a merge join on presorted data, with the cost of reading the whole second index to do the filtering. The second and third will allow data to be prefiltered, but require a hash join. Give how much of the cost appears to be in the hash join, the merge join might well be a good option.

As your query requires 52 of the 87 bytes per row, and indexes have overheads, you may not end up with the index taking much, if any, less space then the table itself.

Another approach is to attack the "randomly distributed" side of it, by looking at clustering.

share|improve this answer
1  
It didn't look to me like it would be worth adding a covering index on the flight table, as a full scan seems to only take 220ms? – David Aldridge Dec 5 '12 at 8:42
@DavidAldridge Fair point, although having the covering index starting with flight key on both tables could allow a merge join, which I'd expect to be faster than the hash join on pre-sorted data. – Laurence Dec 5 '12 at 9:31
@DavidAldridge The user is on PostgreSQL 9.1, which doesn't have index-only scans (like covering indexes), so the issue is moot anyway. – Craig Ringer Dec 6 '12 at 4:32
@CraigRinger oh that's the MVCC issue is it? If so, is that different in 9.2? – David Aldridge Dec 6 '12 at 8:40
1  
@DavidAldridge Are you're referring to the way that indexes don't have visibility information in them so Pg must examine the heap to determine whether a row is visible - is that what you mean by "the MVCC issue"? If so, 9.2 adds index-only scans that can avoid most heap access by using the visibility map; see the release notes and wiki.postgresql.org/wiki/Index-only_scans – Craig Ringer Dec 6 '12 at 8:53
show 2 more comments

The row count estimates are pretty reasonable, so I doubt this is a stats issue.

I'd try:

  • Creating an index on position_2012_09_12(lon,"time") or possibly a partial index on position_2012_09_12("time") WHERE (lon < 0) if you routinely search for lon < 0.

  • Setting random_page_cost lower, maybe 1.1. See if (a) this changes the plan and (b) if the new plan is actually faster. For testing purposes to see if avoiding a seqscan would be faster you can SET enable_seqscan = off; if it is, change the cost paramters.

  • Increase work_mem for this query. SET work_mem = 10M or something before running it.

  • Running the latest PostgreSQL if you aren't already. Always specify your PostgreSQL version in questions. (Update after edit): You're on 9.1; that's fine. The biggest performance improvement in 9.2 was index-only scans, and it doesn't seem likely that you'd benefit massively from index-only scans for this query.

You'll also somewhat improve performance if you can get rid of columns to narrow the rows. It won't make tons of difference, but it'll make some.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.