Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.
Introduction to PostgreSQL SELECT DISTINCT
clause
The DISTINCT
clause is used in the SELECT
statement to remove duplicate rows from the result set. The DISTINCT
clause keeps one row for each group of duplicates. The DISTINCT
clause can be used on one or more columns of a table.
The following illustrates the syntax of the DISTINCT
clause:
1 2 3 4 | SELECT DISTINCT column_1 FROM table_name; |
In this statement, the values in the column_1
column is used to evaluate the duplicate.
If you specify multiple columns, the DISTINCT
clause will evaluate the duplicate based on the combination of values of these columns.
1 2 3 4 5 | SELECT DISTINCT column_1, column_2 FROM tbl_name; |
PostgreSQL also provides the DISTINCT ON (expression)
to keep the “first” row of each group of duplicates using the following syntax:
1 2 3 4 5 6 7 8 9 | SELECT DISTINCT ON (column_1), column_2 FROM tbl_name ORDER BY column_1, column_2; |
The order of rows returned from the SELECT
statement is unpredictable therefore the “first” row of each group of the duplicate is also unpredictable. It is good practice to always use the ORDER BY
clause with the DISTINCT ON(expression)
to make the result set obvious.
Notice that the DISTINCT ON
expression must match the leftmost expression in the ORDER BY
clause.
PostgreSQL SELECT DISTINCT examples
Let’s create a new table named t1
and insert data into the table for practicing the DISTINCT
clause.
First, use the following statement to create the t1
table that consists of three columns: id
, bcolor
and fcolor
.
1 2 3 4 5 | CREATE TABLE t1 ( id serial NOT NULL PRIMARY KEY, bcolor VARCHAR, fcolor VARCHAR ); |
Second, insert some rows into the t1
table using the following INSERT
statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | INSERT INTO t1 (bcolor, fcolor) VALUES ('red', 'red'), ('red', 'red'), ('red', NULL), (NULL, 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('green', 'green'), ('blue', 'red'), ('blue', 'green'), ('blue', 'blue'); |
Third, query the data from the t1
table by using the SELECT
statement:
1 2 3 4 5 6 | SELECT id, bcolor, fcolor FROM t1; |
PostgreSQL DISTINCT on one column example
The following statement selects unique values in the bcolor
column from the t1
table and sorts the result set in alphabetical order by using the ORDER BY
clause.
1 2 3 4 5 6 | SELECT DISTINCT bcolor FROM t1 ORDER BY bcolor; |
PostgreSQL DISTINCT on multiple columns
The following statement demonstrates how to use the DISTINCT
clause on multiple columns:
1 2 3 4 5 6 7 8 | SELECT DISTINCT bcolor, fcolor FROM t1 ORDER BY bcolor, fcolor; |
Because we specified both bcolor
and fcolor
columns in the SELECT DISTINCT
clause, PostgreSQL combined the values in both bcolor
and fcolor
columns to evaluate the uniqueness of the rows.
The query returns the unique combination of bcolor
and fcolor
from the t1
table. Notice that the row which has red
value in the bcolor
and fcolor
columns was removed from the result set.
PostgreSQL DISTINCT ON ORDER BY example
The following statement sorts the result set by the bcolor
and fcolor
, and then for each group of duplicates, it keeps the first row in the returned result set.
1 2 3 4 5 6 7 8 9 | SELECT DISTINCT ON (bcolor) bcolor, fcolor FROM t1 ORDER BY bcolor, fcolor; |
In this tutorial, you have learned how to use PostgreSQL SELECT DISTINCT
statement to remove duplicate rows from the result set returned by a query.