Help optimizing this query with multiple where exists

Question

In the application I'm building, the user is able to define 'types' where each 'type' has a set of 'attributes'.

The user is able to create instances of products by defining a value for each attribute the product's type has.

A pic of the schema:

I'm creating the query where the user specifies the attributes values and the product type and with that I should return all the product id's that meets the query.

The problem I see in my query is that I'm performing a whole select * from attributes_products ... for each attribute that the product's type has.

Is there a way to optimize this? If I create an index in the column attributes_products.product_id would this query be actually optimal?

Example of a query where I'm looking for a product whose type has 3 attributes:

select p.id
from Products as p
where 
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 27 AND
        value = 'some_value'
) AND
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 28 AND
        value = 'other_value'
) AND
exists(
    select * from attributes_products
    where
        product_id = p.id AND
        attribute_id = 29 AND
        value = 'oother_value'
)

Many thanks.

Conclusions

So, Gareth Rees (selected answer) proposed another solution which involves multiple Joins. Here is the explanation of its query (done by PGAdmin): Selected answer query explanation

This is the explanation of the original query:

I believe that the selected answer is slightly faster, but consumes a lot more memory (because of the triple join). I believe that my original query is slightly slower (very slightly, since there's an index on the attributes_products table) but a lot more efficient in memory.

what Database Engine are you using? – Malachi Nov 4 '13 at 17:30
@Malachi, I'm using PostgreSQL as database manager/engine – Lay González Nov 4 '13 at 17:54

Gareth Rees · Accepted Answer · 2013-11-04 18:06:56Z

SQL allows you to join the same table multiple times, so what you need here is:

SELECT p.id FROM products AS p
JOIN attributes_products AS ap1
  ON ap1.product_id = p.id AND ap1.attribute_id = 27 AND ap1.value = '...'
JOIN attributes_products AS ap2
  ON ap2.product_id = p.id AND ap2.attribute_id = 28 AND ap2.value = '...'
JOIN attributes_products AS ap3
  ON ap3.product_id = p.id AND ap3.attribute_id = 29 AND ap3.value = '...'

Here's the toy MySQL database that I'm using to answer this question:

CREATE TABLE products (
    id INTEGER PRIMARY KEY AUTO_INCREMENT
);
CREATE TABLE attributes_products (
    product_id INTEGER NOT NULL,
    attribute_id INTEGER NOT NULL,
    value CHAR(40)
);
CREATE INDEX ap_product ON attributes_products (product_id);
CREATE INDEX ap_attribute ON attributes_products (attribute_id);

INSERT INTO products VALUES (1);
INSERT INTO products VALUES (2);
INSERT INTO attributes_products VALUES (1, 27, 'a');
INSERT INTO attributes_products VALUES (1, 28, 'b');
INSERT INTO attributes_products VALUES (1, 29, 'c');

With my query above, MySQL reports the following query plan:

+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
| id | select_type | table | type   | possible_keys           | key          | key_len | ref                 | rows | Extra                    |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+
|  1 | SIMPLE      | ap1   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap2   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | ap3   | ref    | ap_product,ap_attribute | ap_attribute | 4       | const               |    1 | Using where              |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY                 | PRIMARY      | 4       | temp.ap3.product_id |    1 | Using where; Using index |
+----+-------------+-------+--------+-------------------------+--------------+---------+---------------------+------+--------------------------+

See the MySQL documentation for an explanation of the EXPLAIN output.

This looks better than the plan for the OP's query:

+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
| id | select_type        | table               | type  | possible_keys           | key          | key_len | ref   | rows | Extra                    |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+
|  1 | PRIMARY            | p                   | index | NULL                    | PRIMARY      | 4       | NULL  |    2 | Using where; Using index |
|  4 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  3 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
|  2 | DEPENDENT SUBQUERY | attributes_products | ref   | ap_product,ap_attribute | ap_attribute | 4       | const |    1 | Using where              |
+----+--------------------+---------------------+-------+-------------------------+--------------+---------+-------+------+--------------------------+

But results will vary from one database to another: a good query planner might be able to make something efficient out of the OP's query.

wouldn't this still bog down a little bit? I mean not as much as the original but it would still bog down wouldn't it? — Malachi, Nov 4 '13 at 17:01
It all depends on the query planner, but with indexes on attributes_products.product_id and attributes_products.attribute_id MySQL looks like it has a decent plan. Try it on your preferred database and see for yourself. — Gareth Rees, Nov 4 '13 at 17:25
I just assumed this was SQL Server, I didn't even look for tags...oops — Malachi, Nov 4 '13 at 17:29
But what if I create an index in the attributes_products table on the product_id. Yes, my query is doing multiple subqueries, but I think that they would take minimum time given the index I'm talking about; in exchange of that, my query doesn't consumes too much memory. I think (but correct me if I'm wrong) that your query would be slightly faster, but it will hold a bunch of memory for the inner join. — Lay González, Nov 4 '13 at 18:32
There's no substitute for trying both and seeing which is better! — Gareth Rees, Nov 4 '13 at 19:45

asked	5 months ago
viewed	165 times
active	5 months ago

current community

your communities

more stack exchange communities

Help optimizing this query with multiple where exists

Conclusions

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged optimization sql postgresql or ask your own question.

Community Bulletin

Hot Network Questions

current community

your communities

more stack exchange communities

Help optimizing this query with multiple where exists

Conclusions

1 Answer

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged optimization sql postgresql or ask your own question.

Community Bulletin

Related

Hot Network Questions