Outer Joins and Where Clauses

July 19, 2017

Postgres supports both traditional join syntax, which uses the where clause to specify joined columns, and ansi join syntax, that uses the word join in the from clause. While both syntaxes can be used for inner joins, only the ansi join syntax supports outer joins in Postgres.

Enterprise-ready Postgres tools for high availability, monitoring, and disaster recovery. Download Now.

Because column restrictions like col = 4 are not related to joins, you would think that restrictions have the same effect whether they appear in the where clause or as part of a join clause, e.g. a outer join b on a.x = b.x and col = 4. However, this is not always true. Restrictions in the join clause are processed during joins, while where clause restrictions are processed after joins.

This is only significant in outer joins (and cross joins) because columns from unjoined rows are manufactured by outer joins. Here is an example:

CREATE TABLE test1 (x INTEGER);
CREATE TABLE test2 (x INTEGER);

INSERT INTO test1 VALUES (1), (2), (3);
INSERT INTO test2 VALUES (1), (2);

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x;
x |   x
---+--------
1 |      1
2 |      2
3 |  (null)

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x AND test2.x <= 2;
x |   x
---+--------
1 |      1
2 |      2
3 |  (null)

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x WHERE test2.x <= 2;
x | x
---+---
1 | 1
2 | 2

As you can see, the first select performs the outer join with no column restrictions. The second select returns the same result because the column restriction matches all rows in test2 (before the outer join happens). The final select operates on the result of the join and since null <= 2 returns null, which behaves like false, the third output row is suppressed.

What is even more interesting is seeing how a column restriction can cause a query that would normally be fully joined to return unjoined column values:

INSERT INTO test2 VALUES (3);

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x;
x | x
---+---
1 | 1
2 | 2
3 | 3

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x AND test2.x <= 2;
x |   x
---+--------
1 |      1
2 |      2
3 | (null)

SELECT * FROM test1 LEFT JOIN test2 ON test1.x = test2.x WHERE test2.x <= 2;
x | x
---+---
1 | 1
2 | 2

In the second select, the test2.x value of 3 is excluded from the join, resulting in an outer-join value of null for the column.

In summary, be careful of the placement of restriction clauses on inner-side tables in outer-join queries and clearly determine whether you want restrictions to happen at join time or post-join.

Bruce Momjian is Senior Database Architect at EnterpriseDB. 

This post originally appeared on Bruce's personal blog

Share this