How to create an Sql table from SELECT with a multiple row DISTINCT constraint?

Question

I want to create a table from a SELECT statement based on a multiple row DISTINCT, but PostgreSQL does not allow me to do it.

I need to create a table where the elements were previously selected from another table and they are all unique.

Here's my try:

CREATE TABLE new_name AS (

   SELECT DISTINCT(table.field1, table.field2), table.field1, table.field2, 
   FROM ...
   WHERE ...

);

If I run this statement, PostgreSQL says:

ERROR: column "row" has pseudo-type record

How could I solve this problem?

a_horse_with_no_name · Accepted Answer · 2013-08-05 10:35:53Z

distinct is NOT a function. It always operates on all columns of the result.

The difference between select distinct (foo), bar and select distinct foo, bar is the same as the difference between select (foo), bar and select foo, bar. The paranthesis is just "noise".

When you write select (foo,bar) you are actually creating an anonymous object type in Postgres which results in a single column that has two attributes - which is not what you actually want.

As you are using Postgres, you can use the Postgres specific extension DISTINCT ON which - in contrast to the standard DISTINCT does operate on a sub-set of the columns.

You have to specfiy an ORDER BY in that case to "define" which of the rows to take if there is more than one with the same combination of (field1, field2).

CREATE TABLE new_name 
AS 
SELECT DISTINCT ON (table.field1, table.field2), 
       table.field1, 
       table.field2, 
       .....
FROM ...
WHERE ...
ORDER BY ..

If you want to stick to ANSI SQL you will need a window function for this:

create table new_name
as
select column1, column2, column3, column4, column5, column6
from (
   select column1, column2, column3, column4, column5, column6, 
          row_number() over (partition by column1, column2 order by ...) as rn 
   from the_table
   where ...
) t
where rn = 1;

For a large table DISTINC ON is probably faster than the solution with the window function.

@DavideChicco.it You should use an ORDER BY if you want consistent results. From the manual: "Note that the "first row" of each set is unpredictable unless ORDER BY is used": postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT
But I'm not interested in the order, later I'll do just a COUNT(*)... should I use the ORDER BY clause anyway?
The order by is necessary to get a consistent result if you run this more than once. See this example: sqlfiddle.com/#!12/df4ef/4 if you don't care about which values are in the non-distinct columns, then you can leave out the order by

asked	1 month ago
viewed	78 times
active	1 month ago

How to create an Sql table from SELECT with a multiple row DISTINCT constraint?

1 Answer

Your Answer

Not the answer you're looking for? Browse other questions tagged sql postgresql select prepared-statement or ask your own question.

How to create an Sql table from SELECT with a multiple row DISTINCT constraint?

1 Answer

Your Answer

Sign up or login

Post as a guest

Not the answer you're looking for? Browse other questions tagged sql postgresql select prepared-statement or ask your own question.

Related