How to select distinct values from query results in PostgreSQL

SUMMARY: This article reviews methods for selecting distinct values from query results in PostgreSQL. Keywords and clauses discussed with examples include:

1. DISTINCT

2. DISTINCT ON

3. GROUP BY 

4. HAVING 

5. Some caveats

 

In this post, we are going to see how to select distinct values from SQL queries/statements. One of the easiest ways to select distinct values is using the DISTINCT keyword. Let's explore this keyword first.

 

DISTINCT

DISTINCT is used to remove duplicate rows from the SELECT query and only display one unique row from result set. 

Examples 

Let’s set up a new table in PostgreSQL and use it to look at a few helpful ways that DISTINCT can remove duplicates and reveal useful information from the data.

First, connect to psql terminal: 

/usr/pgsql-11/bin/psql -U postgres postgres

 

Create a table, “test”: 

create table test(col1 int, col2 varchar,col3 date);

 

--Insert a few records into table “test”:

insert into test values (1,'abc','2015-09-10');

insert into test values (1,'abc','2015-09-11');

insert into test values (2,'xyz','2015-09-12');

insert into test values (2,'xyz','2015-09-13');

insert into test values (3,'tcs','2015-01-15');

insert into test values (3,'tcs','2015-01-18');

 

Use the SELECT query to see the rows of table “test”:

postgres=# select * from test;

  col1 | col2 |    col3    

------+------+------------

     1 | abc  | 2015-09-10

     1 | abc  | 2015-09-11

     2 | xyz  | 2015-09-12

     2 | xyz  | 2015-09-13

     3 | tcs  | 2015-01-15

     3 | tcs  | 2015-01-18

(6 rows)

 

Now that the table has been set up, let’s look at a few ways that DISTINCT can sort through the data:

1. You can use SELECT with DISTINCT to find only the non-duplicate values from column “col1”: 

postgres=# select distinct(col1) from test order by col1;

  col1 

------

    1

    2

    3

(3 rows)

 

2. SELECT with DISTINCT can also be used in an SQL inline query: 

postgres=# select col1 from (select distinct(col1) from test order by col1) as foo;

 col1 

------

    1

    2

    3

(3 rows)

 

3. DISTINCT can also be used with COUNT and other aggregates:

postgres=# select col1,count(distinct(col1)) from test group by col1;

 col1 | count 

------+-------

    1 |     1

    2 |     1

    3 |     1

(3 rows)

 

4. You can use SELECT with DISTINCT on two columns of the table: 

postgres=#  select distinct col1,col2 from test order by 1;

 col1 | col2 

------+------

    1 | abc

    2 | xyz

    3 | tcs

(3 rows)

 

5. You can also use SELECT with DISTINCT on all columns of the table:   

postgres=#  select distinct col1,col2,col3 from test order by 1;

 col1 | col2 |    col3    

------+------+------------

    1 | abc  | 2015-09-11

    1 | abc  | 2015-09-10

    2 | xyz  | 2015-09-13

    2 | xyz  | 2015-09-12

    3 | tcs  | 2015-01-15

    3 | tcs  | 2015-01-18

(6 rows)

 

Notice here that all rows have been displayed, duplicates included, as there are no distinct rows in the “col3”  column.

A variant of DISTINCT is DISTINCT ON. now Let’s explore this.

 

DISTINCT ON 

 When distinct cannot return unique row when all columns combination is not unique then we can use distinct on clause which will give  first row  from that set of duplicate rows.The column which we are specifying in DISTINCT ON <col_name> should also be present in the ORDER BY clause; otherwise you will get an error.

Example 

You can use DISTINCT ON to display the first of each value in “col1”:

postgres=#  select distinct on (col1) col1,col2,col3 from test order by col1;

 col1 | col2 |    col3    

------+------+------------

    1 | abc  | 2015-09-10

    2 | xyz  | 2015-09-13

    3 | tcs   | 2015-01-15

(3 rows)

 

Caveats

There are a few scenarios where distinct keyword cannot be used.

1. DISTINCT is a reserved keyword in PostgreSQL, so we cannot specify it as an object name.  

postgres=# create table distinct(n int);

ERROR:  syntax error at or near "distinct"

 

2. In a SELECT query we cannot have more than one DISTINCT keyword:

postgres=# select distinct 1, distinct 2;

ERROR:  syntax error at or near "distinct"

 

3. We also cannot use it in WHERE and GROUP BY conditions: 

postgres=# select * from test where distinct(col1)=1;

ERROR:  syntax error at or near "distinct"

 

or

postgres=# select distinct(col1) from test group by distinct(col1);

ERROR:  syntax error at or near "distinct"

 

4. FOR NO KEY UPDATE, FOR UPDATE, FOR SHARE and FOR KEY SHARE cannot be specified with DISTINCT:

postgres=#  select distinct(col1),col2,col3 from test order by 1 for update;

ERROR:  FOR UPDATE is not allowed with DISTINCT clause

 

Another popular way to select non-duplicate values is by using the GROUP BY with HAVING clause.

 

GROUP BY and HAVING

With GROUP BY and HAVING, you can select distinct values based on group of columns. Where having can be used to find duplicate values also.

Functionality-wise group by and distinct works similarly but group by also provide flexibility to use aggregate functions along with getting unique data. Performance-wise also group by is better than distinct.

Examples 

postgres=# select col1, col2 from test group by col1,col2 order by 1;

 col1 | col2 

------+------

    1 | abc

    2 | xyz

    3 | tcs

(3 rows)



  postgres=# select col1, col2,count(*) from test group by col1,col2 order by 1;

  col1 | col2 | count 

------+------+-------

     1 | abc  |     2

     2 | xyz  |     2

     3 | tcs  |     2

(3 rows)





postgres=# select col1 from test group by col1 having count(col1) > 1 order by 1;

 col1 

------

    1

    2

    3

(3 rows)

 

Hope it helps !