Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id);

I created the following query to get the data I need:

SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
     (SELECT * FROM item NATURAL JOIN supplier) AS x 
 WHERE store.price > 500 AND store.quantity > 0 AND
       store.quantity < 100 AND
       x.item_name = 'SomeName';

The query plan:

Nested Loop  (cost=20.76..6513.55 rows=8 width=229)
  ->  Hash Join  (cost=20.76..6511.30 rows=8 width=15)
        Hash Cond: (store.item_id = item.item_id)
        ->  Seq Scan on store  (cost=0.00..6459.00 rows=8388 width=23)
              Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
        ->  Hash  (cost=20.75..20.75 rows=1 width=8)
              ->  Seq Scan on item  (cost=0.00..20.75 rows=1 width=8)
                    Filter: ((item_name)::text = 'SomeName'::text)
  ->  Index Scan using supplier_pkey on supplier  (cost=0.00..0.27 rows=1 width=222)
        Index Cond: (supplier.supplier_id = store.supplier_id)

Now the aim is to reduce the cost by more than 30% by optimizing the query itself. The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research.

Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will...

(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...)

share|improve this question

2 Answers 2

You can probably fix this with indexes. It is a little hard to tell what the keys are because of the "natural join"s. (I recommend using instead of natural join so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.)

I think an index on item(item_name, item_id) would help the query plan.

share|improve this answer
    
Doing that only improves the hash of the item from 20.75 to 8.27, which is insignificant next to the 6000ish seq scan. (I added the relevant indexes that the joins use to the op) –  user742925 May 18 at 14:32
    
I didn't consider global changes when posting the question, but your suggestion definitely works - the right index to add was store(item_id,price), which resulted in a very different query plan and reduced the cost all the way to 5% of the original :-) (I'm not marking your answer as accepted, though, because I'm still interested if an optimization can be done locally - the op says that anyway ) –  user742925 May 18 at 14:53
    
@user742925 . . . You can give the query a hint to use a hash-based algorithm for the join rather than a nested loop. My guess is that is where the performance bottleneck is likely to be. –  Gordon Linoff May 18 at 16:11

Will be hard to optimize because it looks nice, try this to avoid subquery :

SELECT 
    store.quantity, 
    store.price, 
    supplier.supplier_name 
FROM store 
    INNER JOIN item
        ON store.item_id = item.item_id
    INNER JOIN supplier
        ON supplier.supplier_id = store.supplier_id
        AND supplier.item_name = 'SomeName'
WHERE 
    store.price > 500 
    AND store.quantity BETWEEN 0 AND 100;

Use BETWEEN it's better.

Also, add indexes on :

  • store.item_id
  • item.item_id
  • supplier.item_name
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.