Index Order Does Matter

April 12, 2017

Postgres has supported multi-column indexes since 1997, e.g. create index i_test on test (a, b, c). It can easily use an index if the supplied columns are all at the front of the index, e.g. a and b in the previous index, but it can also use the index if some of the indexed column values are not supplied, e.g. columns a and c in the previous index. It does this by looking up a in the index, then looking through the index for matches of c, ignoring values of b,

e.g.

CREATE TABLE test (a INT, b INT, c INT);
INSERT INTO test
        SELECT x, x, x FROM generate_series(1, 100000) AS f(x);
CREATE INDEX i_test ON test(a, b, c);
ANALYZE test;

EXPLAIN (COSTS false)
SELECT * FROM test WHERE a = 1 AND c = 1;
              QUERY PLAN
--------------------------------------
 Index Only Scan using i_test on test
   Index Cond: ((a = 1) AND (c = 1))

Of course, if possible, you should always put the most commonly supplied columns first in the index because skipping columns during index scans (called "index skip scans") is expensive.

However, if you are supplying all the column values referenced in the index, I assumed it didn't matter what order the columns were specified in the index, but Robert Haas recently mentioned this is not always true. For example, if a has many duplicate values, and b has mostly unique values, having a at the start of the index is suboptimal — better to use b first, which will more effectively narrow the search space when looking for matches on a.

A more specific example of this is range queries. In that case, if the range test of one column is less restrictive than the equality test of another, it would be better for the equality test column to be first in an index. For example, in the queries below, the first query uses an index because it is very restrictive on the first column, the second query uses an index because it is moderately restrictive on the first column, while the third does not use an index because it is effectively unrestrictive on the first column:

EXPLAIN SELECT * FROM test WHERE a = 1 AND b >= 1 AND b <= 100000;
                               QUERY PLAN
-------------------------------------------------------------------------
 Index Only Scan using i_test on test  (cost=0.42..8.44 rows=1 width=12)
   Index Cond: ((a = 1) AND (b >= 1) AND (b <= 100000))

EXPLAIN SELECT * FROM test WHERE a >= 1 AND a <= 50000 AND b = 1;
                                 QUERY PLAN
----------------------------------------------------------------------------
 Index Only Scan using i_test on test  (cost=0.42..1404.10 rows=1 width=12)
   Index Cond: ((a >= 1) AND (a <= 50000) AND (b = 1))

EXPLAIN SELECT * FROM test WHERE a >= 1 AND a <= 100000 AND b = 1;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on test  (cost=0.00..2291.00 rows=1 width=12)
   Filter: ((a >= 1) AND (a <= 100000) AND (b = 1))

Notice the increasing costs, even though all queries match one indexed row.

Obviously, in cases where you are not specifying all indexed columns in every query, you should put the most frequently referenced columns first in the index to avoid the overhead of index skip scans. However, for cases where most indexed columns are going to be supplied in queries, placing the most restrictive columns first in indexes is a win.

Bruce Momjian is Senior Data Architect at EnterpriseDB. 

This post originally appeared on Bruce's personal blog.

Share this

More Blogs

Explaining ABI Breakage in PostgreSQL 17.1

PostgreSQL comes out with a scheduled major release every year and scheduled minor releases for all supported versions every quarter. But in the November minor releases, two issues caused the...
December 06, 2024

PGVector as Embedding Store in PrivateGPT

EDB has a long history of open source contributions, and while we’re best known for our contributions to Postgres, that’s not the only project we contribute to. e.g Barman, CloudNativePG...
June 05, 2024

pgAdmin User Management in Server Mode

pgAdmin can be deployed as a web application by configuring the app to run in server mode. One can check out server deployment on how to run pgAdmin in server...
August 24, 2023