PostgreSQL - Query with aggregate functions

Question

I need some help for a PostgreSQL query. I have 4 tables involved on it: customer, organization_complete, entity and address. I retrieve some data from everyone and with this query:

SELECT distinct ON (c.customer_number, trim(lower(o.name)), a.street, a.zipcode, a.area, a.country) 
                              c.xid AS customer_xid, o.xid AS entity_xid, c.customer_number, c.deleted, o.name, o.vat, 'organisation' AS customer_type, a.street, a.zipcode, a.city, a.country
          FROM customer c
          INNER JOIN organisation_complete o ON (c.xid = o.customer_xid AND c.deleted = 'FALSE')
          INNER JOIN entity e ON e.customer_xid = o.customer_xid
          INNER JOIN address a ON (a.contact_info_xid = e.contact_info_xid and a.address_type = 'delivery')
          WHERE c.account_xid = "<value>"

I get a distinct of all the customers splitted by customer_number, name, street, zipcode, area and country (what's specified after the DISTINCT ON statement). What I need to retrieve now is a distinct of all customers having a doubled row on DB but I also need to retrieve the customer_xid and the entity_xid, that are primary keys of the respective tables and so are unique. For this reason they can't be included into an aggregate function. All I need is to count how many rows with the same customer_number, name, street, zipcode, area and country I have for each distinct tuple and to select only tuples with a count bigger than 1. For each selected tuple I need also to take a customer_xid and an entity_xid, at random, like MySQL would do with a_key in a query like this:

SELECT COUNT(*), tab.a_key, tab.b, tab.c from tab
WHERE 1
GROUP BY tab.b

I know MySQL is quite an exception regarding this, I just want to know if may be possible to obtain the same result on PostgreSQL.

Thanks,

L.

MySQL is not that broken and even it won't run that query, please correct it. — Jakub Kania
– Jakub Kania, Commented Mar 6, 2014 at 10:43
Wow, I got currious and just tested the query myself. It. really. works. I knew that MySQL is broken but that broken?!? @JakubKania seems like my +1 on your comment was a little too quick... — DrColossos
– DrColossos, Commented Mar 6, 2014 at 11:44
I would not say MySQL is broken because of that. Previous SQL standards would reject that query because you can not SELECT non-aggregate fields that are not part of the GROUP BY clause in an aggregate query. But this is correct up to 1992. Now, according to SQL-2003 standards, columns in the SELECT and HAVING lists are still functionally dependent on the GROUP BY columns. If not, the query is not refused, you may just get indeterminate results (a random choose in our case). It's all here: dev.mysql.com/doc/refman/5.5/en/group-by-extensions.html — Luca Ballore
– Luca Ballore, Commented Mar 6, 2014 at 12:00
@DrColossos Oh, actually I thought it was missing the FROM clause. I didn't notice it due to the formatting. Actually that is valid for PostgreSQL too. In case of trouble PG throws an error instead of making up results like MySQL though. — Jakub Kania
– Jakub Kania, Commented Mar 6, 2014 at 12:04

krokodilko · Accepted Answer · 2014-03-06 14:18:47Z

1

This query in MySql is using a ~~nonstandard~~ (see note below) "MySql group by extension": http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html

SELECT COUNT(*), tab.a_key, tab.b, tab.c 
from tab
WHERE 1
GROUP BY tab.b

Note: This is a feature definied in SQL:2003 Standard as T301 Functional dependencies, it is not required by the standard, and many RDBMS don't support it, including PostgreSql (see this link for version 9.3 - unsupported features: http://www.postgresql.org/docs/9.3/static/unsupported-features-sql-standard.html ).

The above query could be expressed in PostgreSQL in this way:

SELECT tab.a_key, tab.b, tab.c,
       q.cnt
FROM (
    SELECT tab.b,
           COUNT(*) As cnt, 
           MIN(tab.unique_id) As unique_id /* could be also MAX */
    from tab
    WHERE 1
    GROUP BY tab.b
) q
JOIN tab ON tab.unique_id = q.unique_id

where unique_id is a column that uniquely identifies each row in tab (usually a primary key).
Min or Max functions choose one row from the table in a pseudo-random manner.

edited Mar 6, 2014 at 14:18

answered Mar 6, 2014 at 12:05

krokodilko

36.3k7 gold badges61 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Luca Ballore Over a year ago

According to SQL-2003 standards it is not a nonstandard extension anymore, but thanks, this is a good hint for what I need :)

krokodilko Over a year ago

@lucone83 Thank you for pointing that, I didn't know, I have updated my answer.

Collectives™ on Stack Overflow

PostgreSQL - Query with aggregate functions

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related