PostgreSQL simple query optimization

Question

PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id);

I created the following query to get the data I need:

SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
     (SELECT * FROM item NATURAL JOIN supplier) AS x 
 WHERE store.price > 500 AND store.quantity > 0 AND
       store.quantity < 100 AND
       x.item_name = 'SomeName';

The query plan:

Nested Loop  (cost=20.76..6513.55 rows=8 width=229)
  ->  Hash Join  (cost=20.76..6511.30 rows=8 width=15)
        Hash Cond: (store.item_id = item.item_id)
        ->  Seq Scan on store  (cost=0.00..6459.00 rows=8388 width=23)
              Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
        ->  Hash  (cost=20.75..20.75 rows=1 width=8)
              ->  Seq Scan on item  (cost=0.00..20.75 rows=1 width=8)
                    Filter: ((item_name)::text = 'SomeName'::text)
  ->  Index Scan using supplier_pkey on supplier  (cost=0.00..0.27 rows=1 width=222)
        Index Cond: (supplier.supplier_id = store.supplier_id)

Now the aim is to reduce the cost by more than 30% by optimizing the query itself. The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research.

Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will...

(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...)

Gordon Linoff · Answer 1 · 2014-05-18 14:19:09Z

up vote 1 down vote

You can probably fix this with indexes. It is a little hard to tell what the keys are because of the "natural join"s. (I recommend using instead of natural join so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.)

I think an index on item(item_name, item_id) would help the query plan.

answered May 18 at 14:19

Gordon Linoff
247k1258103

Doing that only improves the hash of the item from 20.75 to 8.27, which is insignificant next to the 6000ish seq scan. (I added the relevant indexes that the joins use to the op) – user742925 May 18 at 14:32

I didn't consider global changes when posting the question, but your suggestion definitely works - the right index to add was store(item_id,price), which resulted in a very different query plan and reduced the cost all the way to 5% of the original :-) (I'm not marking your answer as accepted, though, because I'm still interested if an optimization can be done locally - the op says that anyway ) – user742925 May 18 at 14:53

@user742925 . . . You can give the query a hint to use a hash-based algorithm for the join rather than a nested loop. My guess is that is where the performance bottleneck is likely to be. – Gordon Linoff May 18 at 16:11

add a comment |

Ryx5 · Answer 2 · 2014-05-19 06:27:41Z

Will be hard to optimize because it looks nice, try this to avoid subquery :

SELECT 
    store.quantity, 
    store.price, 
    supplier.supplier_name 
FROM store 
    INNER JOIN item
        ON store.item_id = item.item_id
    INNER JOIN supplier
        ON supplier.supplier_id = store.supplier_id
        AND supplier.item_name = 'SomeName'
WHERE 
    store.price > 500 
    AND store.quantity BETWEEN 0 AND 100;

Use BETWEEN it's better.

Also, add indexes on :

store.item_id
item.item_id
supplier.item_name

asked	7 months ago
viewed	59 times
active	7 months ago

current community

your communities

more stack exchange communities

PostgreSQL simple query optimization

2 Answers 2

Your Answer

Not the answer you're looking for? Browse other questions tagged sql postgresql query-optimization or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

PostgreSQL simple query optimization

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql postgresql query-optimization or ask your own question.

Related

Hot Network Questions