Query Plan Improvement with Expression Indexes in PostgreSQL

Bruce Momjian

January 20, 2023

Most people know that Postgres allows the creation of indexes on expressions. This is helpful if you need index lookups of expressions used in where clauses.

However, there is another benefit to expression indexes, and that is optimizer statistics. Not only do expression indexes allow rapid lookups of matching expressions, but they also provide optimizer statistics, which improve row estimates and hence query plans.

Here is an example:

        CREATE TABLE test (x INTEGER);
        INSERT INTO test SELECT x FROM generate_series(1, 100) AS t(x);
        
        ANALYZE test;
        
        SELECT COUNT(*) FROM test WHERE x % 2 = 1;
         count
        -------
            50
        
        EXPLAIN SELECT * FROM test WHERE x % 2 = 1;
                             QUERY PLAN
        ----------------------------------------------------
         Seq Scan on test  (cost=0.00..2.50 rows=1 width=4)
           Filter: ((x % 2) = 1)

The optimizer doesn't know the selectivity of the modulus operator, so it initially assumes only one row is returned. Once an expression index is created and analyze statistics generated, the optimizer knows exactly how many rows will be returned:

        CREATE INDEX i_test ON test ((x % 2));
        
        ANALYZE test;
        
        EXPLAIN SELECT * FROM test WHERE x % 2 = 1;
                             QUERY PLAN
        -----------------------------------------------------
         Seq Scan on test  (cost=0.00..2.50 rows=50 width=4)
           Filter: ((x % 2) = 1)

Interestingly, the optimizer used expression index statistics, even though the expression index itself was not used. In the example above, the modulus operator is not selective enough to make the index useful, but expression statistics would be useful for more complex queries, e.g. with joins. This method can also be used to create statistics on functions.

It is also possible to create an expression index that generates cross-columns statistics, the benefits of which I mentioned in an earlier blog post. For example, this expression index would supply accurate statistics for state/city combinations, but queries would need to use the exact concatenation construction:

        CREATE INDEX i_customer_state_city ON customer ((state || '|' || city));

It would be nice if there was a way to create expression statistics without the overhead of creating and maintaining indexes.

This post originally appeared on Bruce's personal blog.

Resource Feature Callout 1

Query Plan Improvement with Expression Indexes in PostgreSQL

Bruce Momjian

Relevant Blogs

EDB Tutorial: How to run a TPC-E-like benchmark easily

EDB Tutorial: How To Run a Complex Postgres Benchmark Easily - Master TPC-C in 3 Short Steps

EDB Tutorial: Achieving High Availability Using Enterprise Failover Manager

More Blogs

Why you should use Connection Pooling when setting Max_connections in Postgres

How to create a PostgreSQL database and users using psql and pgAdmin

Some numbers, oh, and pgAdmin 4 v1.6 was released.