postgresql hashaggregate query optimization

Question

I am trying to optimize the query below.

select cellid2 as cellid, max(endeks) as turkcell 
from (select a.cellid2 as cellid2, b.endeks 
    from (select geom, cellid as cellid2 from grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000 ) a join (select endeks, st_transform(geom,    2320) as geom_tmp from  turkcell_data ) b on st_intersects(a.geom, b.geom_tmp) ) x
group by cellid2 limit 5

and explain analyze returns

"Limit  (cost=81808.31..81808.36 rows=5 width=12) (actual time=271376.201..271376.204 rows=5 loops=1)"
"  ->  HashAggregate  (cost=81808.31..81879.63 rows=7132 width=12) (actual time=271376.200..271376.203 rows=5 loops=1)"
"        ->  Nested Loop  (cost=0.00..81772.65 rows=7132 width=12) (actual time=5.128..269753.647 rows=1237707 loops=1)"
"              Join Filter: _st_intersects(grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000.geom, st_transform(turkcell_data.geom, 2320))"
"              ->  Seq Scan on turkcell_data  (cost=0.00..809.40 rows=3040 width=3045) (actual time=0.031..7.426 rows=3040 loops=1)"
"              ->  Index Scan using grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist on grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000  (cost=0.00..24.76 rows=7 width=124) (actual time=0.012..0.799 rows=647 loops=3040)"
"                    Index Cond: (geom && st_transform(turkcell_data.geom, 2320))"
"Total runtime: 271387.499 ms"

There exist indexes on geometry column and cellid columns. I read that instead of using max, order by desc and limit 1 works better. However, since I have group by clause, it does not work I think. Is there any way to do this or any other way which improves the performance?

Table Definitions:

CREATE TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
(
  regionid numeric,
  geom geometry(Geometry,2320),
  cellid integer,
  turkcell double precision
)
WITH (
  OIDS=FALSE
);
ALTER TABLE grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  OWNER TO postgres;

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid;

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_cellid
  ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  USING btree
  (cellid );

-- Index: grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist

-- DROP INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist;

CREATE INDEX grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000_geom_gist
  ON grd_90098780_7c48_11e3_8876_f0bf97e0dd001000000000
  USING gist
  (geom );

CREATE TABLE turkcell_data
(
  gid serial NOT NULL,
  objectid_1 integer,
  objectid integer,
  neighbourh numeric,
  endeks numeric,
  coorx numeric,
  coory numeric,
  shape_leng numeric,
  shape_le_1 numeric,
  shape_area numeric,
  geom geometry(MultiPolygon,4326),
  CONSTRAINT turkcell_data_pkey PRIMARY KEY (gid )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE turkcell_data
  OWNER TO postgres;

-- Index: turkcell_data_geom_gist

-- DROP INDEX turkcell_data_geom_gist;

CREATE INDEX turkcell_data_geom_gist
  ON turkcell_data
  USING gist
  (geom );

If you want us to help optimize a query, you need to show us the table and index definitions, as well as row counts for each of the tables. Maybe your tables are defined poorly. Maybe the indexes aren't created correctly. Maybe you don't have an index on that column you thought you did. Without seeing the table and index definitions, we can't tell. We also need row counts because that can affect query optimization greatly. — Andy Lester, Jan 13 at 14:50
Your nested loop has subqueries with N=7 and N=3040 and results in N=1237707 rows. That is even worse than a carsesian product! — joop, Jan 14 at 10:14

yieldsfalsehood · Answer 1 · 2014-01-13 15:42:20Z

up vote 0 down vote

Either store your data re-projected to 2320, index that column, and use it in your join, or create an index on the transformed projection of the geometry in turkcell_data. I usually prefer the latter:

CREATE INDEX turkcell_data_geom_gist2320
  ON turkcell_data
  USING gist
  (st_transform(geom, 2320) );

The other issue might be if your geometries are very complex - if any of your polygons have a relatively large number of points you might get stuck crunching away on the intersection. Try the index first, though.

answered Jan 13 at 15:42

yieldsfalsehood
91857

I put the index but it does not change that much. – adaminasabi Jan 14 at 7:49

First can you "set enable_seqscan = false", then run and post an explain analyze (to verify the index was created correctly)? Then "set enable_seqscan = true", and run and post an explain analyze again. Can you also describe your layers some? It looks like about 600 features in one layer and about 3,000 in the other. Do any of the geometries have an exorbitant number of points? Is it the case that nearly every geometry from the first layer intersects with nearly every geometry in the second? – yieldsfalsehood Jan 14 at 14:18

add a comment |

asked	11 months ago
viewed	657 times
active	11 months ago

current community

your communities

more stack exchange communities

postgresql hashaggregate query optimization

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged postgresql query-optimization query-performance or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

postgresql hashaggregate query optimization

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged postgresql query-optimization query-performance or ask your own question.

Related

Hot Network Questions