0

I need some help for a PostgreSQL query. I have 4 tables involved on it: customer, organization_complete, entity and address. I retrieve some data from everyone and with this query:

SELECT distinct ON (c.customer_number, trim(lower(o.name)), a.street, a.zipcode, a.area, a.country) 
                              c.xid AS customer_xid, o.xid AS entity_xid, c.customer_number, c.deleted, o.name, o.vat, 'organisation' AS customer_type, a.street, a.zipcode, a.city, a.country
          FROM customer c
          INNER JOIN organisation_complete o ON (c.xid = o.customer_xid AND c.deleted = 'FALSE')
          INNER JOIN entity e ON e.customer_xid = o.customer_xid
          INNER JOIN address a ON (a.contact_info_xid = e.contact_info_xid and a.address_type = 'delivery')
          WHERE c.account_xid = "<value>"

I get a distinct of all the customers splitted by customer_number, name, street, zipcode, area and country (what's specified after the DISTINCT ON statement). What I need to retrieve now is a distinct of all customers having a doubled row on DB but I also need to retrieve the customer_xid and the entity_xid, that are primary keys of the respective tables and so are unique. For this reason they can't be included into an aggregate function. All I need is to count how many rows with the same customer_number, name, street, zipcode, area and country I have for each distinct tuple and to select only tuples with a count bigger than 1. For each selected tuple I need also to take a customer_xid and an entity_xid, at random, like MySQL would do with a_key in a query like this:

SELECT COUNT(*), tab.a_key, tab.b, tab.c from tab
WHERE 1
GROUP BY tab.b

I know MySQL is quite an exception regarding this, I just want to know if may be possible to obtain the same result on PostgreSQL.

Thanks,

L.

9
  • 1
    MySQL is not that broken and even it won't run that query, please correct it. Commented Mar 6, 2014 at 10:43
  • That query runs on MySQL... Commented Mar 6, 2014 at 11:16
  • Wow, I got currious and just tested the query myself. It. really. works. I knew that MySQL is broken but that broken?!? @JakubKania seems like my +1 on your comment was a little too quick... Commented Mar 6, 2014 at 11:44
  • I would not say MySQL is broken because of that. Previous SQL standards would reject that query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query. But this is correct up to 1992. Now, according to SQL-2003 standards, columns in the SELECT and HAVING lists are still functionally dependent on the GROUP BY columns. If not, the query is not refused, you may just get indeterminate results (a random choose in our case). It's all here: dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html Commented Mar 6, 2014 at 12:00
  • @DrColossos Oh, actually I thought it was missing the FROM clause. I didn't notice it due to the formatting. Actually that is valid for PostgreSQL too. In case of trouble PG throws an error instead of making up results like MySQL though. Commented Mar 6, 2014 at 12:04

1 Answer 1

1

This query in MySql is using a nonstandard (see note below) "MySql group by extension": http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html

SELECT COUNT(*), tab.a_key, tab.b, tab.c 
from tab
WHERE 1
GROUP BY tab.b

Note: This is a feature definied in SQL:2003 Standard as T301 Functional dependencies, it is not required by the standard, and many RDBMS don't support it, including PostgreSql (see this link for version 9.3 - unsupported features: http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html ).


The above query could be expressed in PostgreSQL in this way:

SELECT tab.a_key, tab.b, tab.c,
       q.cnt
FROM (
    SELECT tab.b,
           COUNT(*) As cnt, 
           MIN(tab.unique_id) As unique_id /* could be also MAX */
    from tab
    WHERE 1
    GROUP BY tab.b
) q
JOIN tab ON tab.unique_id = q.unique_id

where unique_id is a column that uniquely identifies each row in tab (usually a primary key).
Min or Max functions choose one row from the table in a pseudo-random manner.

Sign up to request clarification or add additional context in comments.

2 Comments

According to SQL-2003 standards it is not a nonstandard extension anymore, but thanks, this is a good hint for what I need :)
@lucone83 Thank you for pointing that, I didn't know, I have updated my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.