(The following is a highly simplified description of my problem. The company policy does not allow me to describe the actual scenario in any detail.)

The DB tables involved are:

PRODUCTS:
ID   Name
---------
1    Ferrari
2    Lamborghini
3    Volvo


CATEGORIES:
ID    Name
----------
10    Sports cars
20    Safe cars
30    Red cars

PRODUCTS_CATEGORIES
ProductID    CategoryID
-----------------------
1            10
1            30
2            10
3            20

LOCATIONS:
ID      Name
------------
100     Sports car store
200     Safe car store
300     Red car store
400     All cars r us


LOCATIONS_CATEGORIES:
LocationID    CategoryID
------------------------
100           10
200           20
300           30
400           10
400           20
400           30

Note that the locations are not directly connected to the products, only the categories. The customer should be able to see a list of locations that can provide all the product categories that the products they want to buy belong to. So, for example:

A customer wants to buy a Ferrari. This would be available from stores in categories 10 or 30. This gives us stores 100, 300 and 400 but not 200.

However, if a customer wants to buy a Volvo and a Lamborghini this would be available from stores in categories 10 and 20. Which only gives us store 400.

Another customer wants to buy a Ferrari and a Volvo. This they could get from a store in either categories 10 + 20 (sporty and safe) or categories 30 + 20 (red and safe).

What I need is a postgres query that takes a number of products and returns the locations where all of them can be found. I got started with arrays and the <@ operator but got lost quickly. Here follows some example SQL that attempts to get stores where a Ferrari and a Lamborghini can be bought. It does not work correctly since it requires the locations to satisfy all the categories that all the selected cars belong to. It returns location 400 only but should return locations 400 and 100.

SELECT l.* FROM locations l
WHERE 
(SELECT array_agg(DISTINCT(categoryid)) FROM products_categories WHERE productid IN (1,2))
<@
(SELECT array_agg(categoryid) FROM locations_categories WHERE locationid = l.id);

I hope my description makes sense.

share|improve this question
    
Arrays aren't necessary here, you can do all this by multiply joining tables. – Craig Ringer May 15 '15 at 8:25
    
I did some thinking along those lines but couldn't really get it together. Could you possibly help me out with some more details? – Henrik Söderlund May 15 '15 at 8:30
    
Sure, if you put your sample data into a sqlfiddle.com I'll give it a go. – Craig Ringer May 15 '15 at 8:32
    
Really appreciate the help: sqlfiddle.com/#!15/e138d/2 – Henrik Söderlund May 15 '15 at 9:05
up vote 3 down vote accepted

Here is the query. You should insert a list of selected cars Ids pc.ProductId in (1,3) and in the end you should correct condition to selected cars count so if you select 1 and 3 you should write HAVING COUNT(DISTINCT pc.ProductId) = 2 if you select 3 cars then there have to be 3. This condition in HAVING give you condition that ALL cars are in these Locations:

SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1,3)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 2

Sqlfiddle demo

For example for one car it will be:

SELECT Id FROM Locations l
JOIN Locations_Categories lc on l.Id=lc.LocationId
JOIN Products_Categories pc on lc.CategoryId=pc.CategoryID
where pc.ProductId in (1)
GROUP BY l.id
HAVING COUNT(DISTINCT pc.ProductId) = 1

Only Ferrary demo Volvo and a Lamborghini demo

share|improve this answer
    
I only properly understood this after posting mine. This is and equivalent approach to what I wrote, and was posted earlier; all I've done differently is mapped the product names, which is trivial, and done the count slightly differently. So this should be accepted in preference to my answer. – Craig Ringer May 15 '15 at 11:12

(This basically elaborates on @valex's answer, though I didn't realise that until I posted; please accept @valex's not this one).


This can be done using only joins and aggregation.

Build a join tree mapping locations to products, as normal. Then join it with the list of desired products (one-column values rows) and filter the join to only matching product names. You now have one row with the location of a product wherever that product can be found.

Now group by location and return locations where the number of products present is equal to the number we're looking for (for ALL). For ANY we omit the HAVING filter because any location row returned by the join is what we want.

So:

WITH wantedproducts(productname) AS (VALUES('Volvo'), ('Lamborghini'))
SELECT l."ID"
FROM locations l
INNER JOIN locations_categories lc ON (l."ID" = lc."LocationID")
INNER JOIN categories c ON (c."ID" = lc."CategoryID")
INNER JOIN products_categories pc ON (pc."CategoryID" = c."ID")
INNER JOIN products p ON (p."ID" = pc."ProductID")
INNER JOIN wantedproducts wp ON (wp.productname = p."Name")
GROUP BY l."ID"
HAVING count(DISTINCT p."ID") = (SELECT count(*) FROM wantedproducts);

is what you want, basically.

For "stores with any of the wanted products" queries, drop the HAVING clause.

You an also ORDER BY the aggregate if you want to show stores with any match but sort based on number of matches.

You can also add a string_agg(p."Name") to the SELECT values-list if you want to list products that can be found at that store.

If you want your input to be an array rather than a values-list, just replace the VALUES (...) with SELECT unnest($1) and pass your array as the parameter $1, or write it literally in place of $1.

share|improve this answer
    
You should use COUNT(DISTINCT L.Id) as soon as one product can be included in different Categories so it can be counted more than one time. – valex May 15 '15 at 11:21
    
@valex Good point, but it has to be p."ID"; DISTINCT l."ID" when grouping by l."ID" is meaningless. – Craig Ringer May 15 '15 at 11:42

ANSWER IN PROGRESS: (I will add answers as I get the required result)

For your first question:

A customer wants to buy a Ferrari. This would be available from stores in categories 10 or 30. This gives us stores 100, 300 and 400 but not 200.

SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 1

Second question:

However, if a customer wants to buy a Volvo and a Lamborghini this would be available from stores in categories 10 and 20. Which only gives us store 400.

SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE l.id in (select id
               from locations loc
               join locations_categories locat1              
               on loc.id = locat1.LocationId
               join locations_categories locat2
               on loc.id = locat2.LocationId
               where locat1.CategoryId = 10
               AND locat2.categoryId = 20)

RESULT FOR SECOND QUESTION USING INTERSECT: intersect will cross reference all the stores where 1 product can be found each time:

SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 2
INTERSECT
SELECT DISTINCT l.id, l.name
FROM Products p
LEFT JOIN Product_Categories p_c
ON p.id = p_c.ProductId
LEFT JOIN Categories c
ON p_c.CategoryId = c.id
LEFT JOIN Locations_Categories l_c
ON c.id = l_c.CategoryId
LEFT JOIN Locations l
ON l_c.LocationId = l.id
WHERE p.id = 3

For every new product you add a new INTERSECT statement and create a new select with the wanted product id SQLFIDDLE: http://sqlfiddle.com/#!15/ce97d/15

share|improve this answer
    
I'm starting to think this is not the way you want it, I'm guessing you want 1 query to do everything? though this level of complexity might be slightly over my head – Jeremy C. May 15 '15 at 9:30
    
That is exactly the problem. Ideally I would like to be able to input any number of products to this one query and get the locations where all of the products can be found. But maybe it is not possible. And in that case, perhaps it may be better to just do the filtering in the application layer instead. What do you think? – Henrik Söderlund May 15 '15 at 9:33
    
So your input will always be a list of products? not categories? – Jeremy C. May 15 '15 at 9:35
    
Yes, exactly correct. – Henrik Söderlund May 15 '15 at 9:36
    
are you dynamically creating your sql? – Jeremy C. May 15 '15 at 9:41

Well, it's hard to totally avoid arrays here but I think I found a solution with less array functions.

Instead of selecting needed locations, I excluded non valid ones.

WITH needed_categories AS (
  SELECT p."ID", array_agg(pc."CategoryID") AS at_least_one_should_match
  FROM Products p
  JOIN Products_Categories pc ON p."ID" = pc."ProductID"
  WHERE p."ID" IN (1, 3)
  GROUP BY p."ID"
),
not_valid_locations AS (
  SELECT DISTINCT lc."LocationID", unnest(nc.at_least_one_should_match)
  FROM Locations_Categories lc
  JOIN needed_categories nc ON NOT ARRAY[lc."CategoryID"] && nc.at_least_one_should_match 
  EXCEPT
  SELECT * FROM Locations_Categories
) 
SELECT * 
FROM Locations
WHERE "ID" NOT IN (
  SELECT "LocationID" FROM not_valid_locations
);

Here is the SQLFiddle: http://sqlfiddle.com/#!15/e138d/78

This works but I'm still trying to avoid double seq scan of the Location_Categories. The fact that cars can belong to multiple categories is a bit tricky, I solved this using arrays but I'm trying to get rid of these too.

share|improve this answer
    
Avoiding arrays is not really necessary. The db is not huge and this is not a query that will be executed often. So performance is not critical. This is a clever solution. It would not have occured to me to do it backwards like that. – Henrik Söderlund May 15 '15 at 10:10
    
Well, glad to ear about all this because I'm stuck trying to enhance performance and get rid of arrays :) Is this too complex to be ported to your project ? – Clément Prévost May 15 '15 at 10:16
    
I think it should be ok. I am attempting to port it over now, will let you know how it goes. – Henrik Söderlund May 15 '15 at 11:30

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.