Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set.
Introduction to PostgreSQL SELECT DISTINCT clause
The DISTINCT
clause is used in the SELECT statement to remove duplicate rows from a result set. The DISTINCT
clause keeps one row for each group of duplicates. You can use the DISTINCT
clause on one or more columns of a table.
The syntax of DISTINCT
clause is as follows:
1 2 3 | SELECT DISTINCT column_1 FROM table_name |
If you specify multiple columns, the DISTINCT
clause will evaluate the duplicate based on the combination of values of those columns.
1 2 3 4 5 | SELECT DISTINCT column_1, column_2 FROM tbl_name; |
PostgreSQL also provides the DISTINCT ON
(expression) to keep the “first” row of each group of duplicates where the expression is equal. See the following syntax:
1 2 3 4 5 6 7 | SELECT DISTINCT ON (column_1), column_2 FROM tbl_name ORDER BY column_1, column_2; |
The order of rows returned from the SELECT
statement is unpredictable therefore the “first” row of each group of the duplicate is also unpredictable. It is good practice to always use the ORDER BY
clause with the DISTINCT ON(expression)
to make the result obvious.
Notice that the DISTINCT ON
expression must match the leftmost expression in the ORDER BY
clause.
PostgreSQL SELECT DISTINCT examples
Let’s create a new table named t1 and insert some data for practicing with the DISTINCT
clause.
First, use the following statement to create the t1
table with three columns: id
, bcolor
and fcolor
.
1 2 3 4 5 | CREATE TABLE t1 ( id serial NOT NULL PRIMARY KEY, bcolor VARCHAR (25), fcolor VARCHAR (25) ); |
Second, insert some rows into the t1
table by using the INSERT statement as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | INSERT INTO t1 (bcolor, fcolor) VALUES ('red', 'red'), ('red', 'red'), ('red', NULL), (NULL, 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('green', 'green'), ('blue', 'red'), ('blue', 'green'), ('blue', 'blue'); |
Third, query the data from the t1 table by using the SELECT statement:
1 2 | SELECT * FROM t1; |
PostgreSQL DISTINCT on one column example
The following statement selects unique values in the bcolor column from the t1 table and sorts the result set in alphabetical order by using the ORDER BY clause.
1 2 3 4 5 6 | SELECT DISTINCT bcolor FROM t1 ORDER BY bcolor; |
PostgreSQL DISTINCT on multiple columns
The following example demonstrates how to use the DISTINCT
clause on multiple columns. Let’s take a look at the following statement:
1 2 3 4 5 6 7 8 | SELECT DISTINCT bcolor, fcolor FROM t1 ORDER BY bcolor, fcolor; |
Because we specified both bcolor
and fcolor
columns in the SELECT DISTINCT
clause, the values in both bcolor
and fcolor
columns are combined to evaluate the uniqueness of rows.
The query returns the unique combination of bcolor
and fcolor
in the t1
table. Notice that the row that has red value in the bcolor
and fcolor
columns is removed from the result set.
PostgreSQL DISTINCT ON ORDER BY example
First, the following statement sorts the result set by the bcolor and fcolor, and then for each group of duplicates, it keeps the first row in the returned result set.
1 2 3 4 5 6 7 8 | SELECT DISTINCT ON (bcolor) bcolor, fcolor FROM t1 ORDER BY bcolor, fcolor; |
In this tutorial, you have learned how to use PostgreSQL SELECT DISTINCT
statement to remove duplicate rows from a result set.