distinct
is NOT a function. It always operates on all columns of the result.
The difference between select distinct (foo), bar
and select distinct foo, bar
is the same as the difference between select (foo), bar
and select foo, bar
. The paranthesis is just "noise".
When you write select (foo,bar)
you are actually creating an anonymous object type in Postgres which results in a single column that has two attributes - which is not what you actually want.
As you are using Postgres, you can use the Postgres specific extension DISTINCT ON
which - in contrast to the standard DISTINCT
does operate on a sub-set of the columns.
You have to specfiy an ORDER BY in that case to "define" which of the rows to take if there is more than one with the same combination of (field1, field2)
.
CREATE TABLE new_name
AS
SELECT DISTINCT ON (table.field1, table.field2),
table.field1,
table.field2,
.....
FROM ...
WHERE ...
ORDER BY ..
If you want to stick to ANSI SQL you will need a window function for this:
create table new_name
as
select column1, column2, column3, column4, column5, column6
from (
select column1, column2, column3, column4, column5, column6,
row_number() over (partition by column1, column2 order by ...) as rn
from the_table
where ...
) t
where rn = 1;
For a large table DISTINC ON
is probably faster than the solution with the window function.