Count Distinct with PostgreSQL

Question

I am trying to change the count DISTINCT portion of the query below, because the current query returns WINDOW definition is not supported. What is most seamless way to do this calc using PostgreSQL? Thank you.

SELECT transaction_date, brand, Description, Amount, newval
into temp.table
FROM (SELECT transaction_date, brand, Description, Amount,
         (Amount / count(DISTINCT unique_mem_id) over (partition by to_char(transaction_date, 'YYYY-MM'),brand
         )) as newval
  FROM source.table
 )
WHERE DESCRIPTION iLIKE '%Criteria%';

I'm really trying to ask how to do conduct a Count Distinct calculation using PostgreSQL. — ZJAY, Oct 28 '16 at 19:58
partition by to_char(transaction_date, 'YYYY-MM') You dont need to convert to char here, just use date_trunc: partition by date_trunc('month', transaction_date) — wildplasser, Oct 29 '16 at 18:06

Dudu Markovitz · Accepted Answer · 2016-10-30 08:37:06Z

up vote 0 down vote accepted

Due to the use-case it seems better to split the code.

Create a table month_brand based on the following query:

select      to_char(transaction_date, 'YYYY-MM')    as yyyymm
           ,brand
           ,count (distinct unique_mem_id)          as count_distinct_unique_mem_id

from        source.table

group by    yyyymm
           ,brand
;

Join month_brand with your source table:

select      t.transaction_date, t.brand, t.Description, t.Amount, t.Amount / m.count_distinct_unique_mem_id as newval

from                    source.table    as t

            join        month_brand     as m

            on          m.yyyymm = to_char(t.transaction_date, 'YYYY-MM')    

where       t.description ilike '%Criteria%'
;

Instead of count(distinct ...), 2 phase solution:

give row numbers to the duplicated unique_mem_id
count only the unique_mem_id with row_number = 1

select  *

into     temp.table

from   (SELECT  transaction_date, brand, Description, Amount, 
                (Amount / count(case rn when 1 then unique_mem_id end) over (partition by to_char(transaction_date, 'YYYY-MM'),brand)) as newval

        FROM    (SELECT     transaction_date, brand, Description, Amount,unique_mem_id
                            row_numner () over (partition by to_char(transaction_date, 'YYYY-MM'),brand,unique_mem_id) as rn

                 FROM       source.table
                 )
        )

WHERE   DESCRIPTION iLIKE '%Criteria%'
;

edited Oct 30 '16 at 8:37

answered Oct 29 '16 at 17:55

Dudu Markovitz

8,7231626

Hi @ZJAY, Please take a look at the solution. – Dudu Markovitz Oct 29 '16 at 19:43

Solution returned ERROR: column "unique_mem_id" does not exist in derived_table1 – ZJAY Oct 29 '16 at 19:53

@ZJAY, fixed... – Dudu Markovitz Oct 29 '16 at 19:55

If unique_mem_id cannot be null then there is no need to select it in the inner query, and it should replaced int the count with any constant, e.g. 1. – Dudu Markovitz Oct 29 '16 at 19:57

Thank you Dudu, very helpful. One final question. I will run this query 80-100 times a day, each with a different WHERE DESCRIPTION iLIKE '%_%' (final part of query). The query runs across 100s of millions of rows, so it is inefficient to run the non-unique part of the query repeatedly (the Count Distinct is same across all queries). Is there a way where I can run the Count Distinct portion once, perhaps in a separate table, but preserve the results I am seeking. The key will be to partition the correct YYYY-MM for each relevant row without running the partition in each query. Thanks! – ZJAY Oct 29 '16 at 20:22

| show 5 more comments

asked	2 months ago
viewed	63 times
active	2 months ago

current community

your communities

more stack exchange communities

Count Distinct with PostgreSQL

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged sql postgresql or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Count Distinct with PostgreSQL

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql postgresql or ask your own question.

Related

Hot Network Questions