How to Use EXPLAIN ANALYZE for Planning and Optimizing Query Performance in PostgreSQL

Richard Yen April 30, 2020

How to use EXPLAIN ANALYZE for Planning and Optimizing Query Performance in PostgreSQL

With many people working from home these days because of the coronavirus pandemic, it can be a little challenging to get help from a colleague remotely. Sure, there’s Slack and all manner of collaboration tools, but it’s not quite the same as walking up to someone’s cubicle and getting a second pair of eyes to look at a problem, not to mention that our co-workers might be busy trying to juggle deadlines and unruly kids in the home. When it comes to dealing with poor database and query performance, it’s a daunting task to venture into the dark cavern of query planning and optimization, but fear not! EXPLAIN is our friend in those dark and lonely places.

We recently received a request from one of our customers, concerned about a slow query on one of their JSON columns. They were seeing a slow performance in their development environments and were understandably worried about the impact that they’d see if they went to production with poor query performance. We got right to work to help them out, and our first stone to turn over was to have them send us their EXPLAIN ANALYZE output for the query, which yielded:


postgres=# explain SELECT * FROM org where 'aa'::text IN (SELECT jsonb_array_elements(info -> 'dept') ->> 'name');
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Seq Scan on org  (cost=0.00..719572.55 rows=249996 width=1169)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Result  (cost=0.00..2.27 rows=100 width=32)
           ->  ProjectSet  (cost=0.00..0.52 rows=100 width=32)
                 ->  Result  (cost=0.00..0.01 rows=1 width=0)

They knew they had created an index, and were curious as to why the index was not being used. Our next data point to gather was information about the index itself, and it turned out that they had created their index like so:

CREATE INDEX idx_org_dept ON org ((info -> 'dept'::text) ->> 'name'::text));

Notice anything? Their query was wrapping info -> 'dept' in a function called jsonb_array_elements(), which led the query planner to think that it shouldn’t use the index. The fix was simple, and we were able to get the customer back on their way after a rather quick adjustment to their query. Once the customer changed their query to the following, the Index started getting scanned:

postgres=# SELECT * FROM org where 'aa'::text IN (info -> 'dept' ->> 'name');
postgres=# explain SELECT * FROM organization where 'aa'::text IN (info -> 'dept' ->> 'name');
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 Index Scan using idx_org_dept on org  (cost=0.42..8.44 rows=1 width=1169)
   Index Cond: ('aa'::text = ((info -> 'dept'::text) ->> 'name'::text))
(2 rows)

As we can see, having and using EXPLAIN in your troubleshooting arsenal can be invaluable.
 

What is Explain?

EXPLAIN is a keyword that gets prepended to a query to show a user how the query planner plans to execute the given query.  Depending on the complexity of the query, it will show the join strategy, method of extracting data from tables, estimated rows involved in executing the query, and a number of other bits of useful information.  Used with ANALYZE, EXPLAIN will also show the time spent on executing the query, sorts, and merges that couldn’t be done in-memory, and more.  This information is invaluable when it comes to identifying query performance bottlenecks and opportunities, and helps us understand what information the query planner is working with as it makes its decisions for us.

 

A Cost-Based Approach

To the query planner, all the data on disk is basically the same.  To determine the fastest way to reach a particular piece of data requires some estimation of the amount of time it takes to do a full table scan, a merge of two tables, and other operations to get data back to the user.  PostgreSQL accomplishes this by assigning costs to each execution task, and these values are derived from the postgresql.conf file (see parameters ending in *_cost or beginning with enable_*).  When a query is sent to the database, the query planner calculates the cumulative costs for different execution strategies and selects the most optimal plan (which may not necessarily be the one with the lowest cost).

bash $ pgbench -i && psql
<...>
postgres=# EXPLAIN SELECT * FROM pgbench_accounts a JOIN pgbench_branches b ON (a.bid=b.bid) WHERE a.aid < 100000;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..4141.00 rows=99999 width=461)
   Join Filter: (a.bid = b.bid)
   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.01 rows=1 width=364)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..2890.00 rows=99999 width=97)
         Filter: (aid < 100000)
(5 rows)

Here, we see that the Seq Scan on pgbench_accounts has cost 2890 to execute the task.  Where does this value come from?  If we look at some settings and do the calculations, we find:
 


cost = ( #blocks * seq_page_cost ) + ( #records * cpu_tuple_cost ) + ( #records * cpu_filter_cost )


postgres=# select pg_relation_size('pgbench_accounts');
 pg_relation_size
------------------
         13434880

block_size      = 8192   (8kB, typical OS)
#blocks         = 1640   (relation_size / block_size)
#records        = 100000
seq_page_cost   = 1      (default)
cpu_tuple_cost  = 0.01   (default)
cpu_filter_cost = 0.0025 (default)

cost = ( 1640 * 1 ) + ( 100000 * 0.01 ) + ( 100000 * 0.0025 ) = 2890

As we can see, the costs are directly based on some internal statistics that the query planner can work with.

 

A Note About Statistics

The query planner calculates costs based on statistics stored in pg_statistic (don’t look there--there’s nothing human-readable in there.  If you want to get visibility into the table and row statistics, try looking at pg_stats).  If any of these internal statistics are off (i.e., a bloated table or too many joins that cause the Genetic Query Optimizer to kick in), a sub-optimal plan may be selected, leading to poor query performance.  Having bad statistics isn’t necessarily a problem--the statistics aren’t always updated in real-time, and much of it depends on PostgreSQL’s internal maintenance.  As such, it’s imperative that database maintenance is conducted regularly--this means frequent VACUUM-ing and ANALYZE-ing.  Without good statistics, you could end up with something like this:

postgres=# EXPLAIN SELECT * FROM pgbench_history WHERE aid < 100;
                              QUERY PLAN                                                     
-----------------------------------------------------------------------
 Seq Scan on pgbench_history  (cost=0.00..2346.00 rows=35360 width=50)
   Filter: (aid < 100)

In the example above, the database had gone through a fair amount of activity, and the statistics were inaccurate.  With an ANALYZE (not VACUUM ANALYZE or EXPLAIN ANALYZE, but just a plain ANALYZE), the statistics are fixed, and the query planner now chooses an Index Scan:

postgres=# EXPLAIN SELECT * FROM pgbench_history WHERE aid < 100;
                              QUERY PLAN                                                         
----------------------------------------------------------------------
 Index Scan using foo on pgbench_history  (cost=0.42..579.09 rows=153 width=50)
   Index Cond: (aid < 100)

 

How Does EXPLAIN ANALYZE Help?

When an EXPLAIN is prepended to a query, the query plan gets printed, but the query does not get run.  We won’t know whether the statistics stored in the database were correct or not, and we won’t know if some operations required expensive I/O instead of fully running in memory.  When used with ANALYZE, the query is actually run and the query plan, along with some under-the-hood activity is printed out.

If we look at the first query above and run EXPLAIN ANALYZE instead of a plain EXPLAIN, we get:

postgres=# EXPLAIN ANALYZE SELECT * FROM pgbench_accounts a JOIN pgbench_branches b ON (a.bid=b.bid) WHERE a.aid < 100000;
                                                 QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..4141.00 rows=99999 width=461) (actual time=0.039..56.582 rows=99999 loops=1)
   Join Filter: (a.bid = b.bid)
   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.01 rows=1 width=364) (actual time=0.025..0.026 rows=1 loops=1)
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..2890.00 rows=99999 width=97) (actual time=0.008..25.752 rows=99999 loops=1)
         Filter: (aid < 100000)
         Rows Removed by Filter: 1
 Planning Time: 0.306 ms
 Execution Time: 61.031 ms
(8 rows)

You’ll notice here that there’s more information -- actual time and rows, as well as planning and execution times.  If we add BUFFERS, like EXPLAIN (ANALYZE, BUFFERS), we’ll even get cache hit/miss statistics in the output:


postgres=# EXPLAIN (BUFFERS, ANALYZE) SELECT * FROM pgbench_accounts a JOIN pgbench_branches b ON (a.bid=b.bid) WHERE a.aid < 100000;
                                                 QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..4141.00 rows=99999 width=461) (actual time=0.039..56.582 rows=99999 loops=1)
   Join Filter: (a.bid = b.bid)
   Buffers: shared hit=3 read=1638
   ->  Seq Scan on pgbench_branches b  (cost=0.00..1.01 rows=1 width=364) (actual time=0.025..0.026 rows=1 loops=1)
       Buffers: shared hit=1
   ->  Seq Scan on pgbench_accounts a  (cost=0.00..2890.00 rows=99999 width=97) (actual time=0.008..25.752 rows=99999 loops=1)
         Filter: (aid < 100000)
         Rows Removed by Filter: 1
         Buffers: shared hit=2 read=1638
 Planning Time: 0.306 ms
 Execution Time: 61.031 ms
(8 rows)

Very quickly, you can see that EXPLAIN can be a useful tool for people looking to understand their database performance behaviors.

 

A Quick Review of Scan Types and Joins

It’s important to know that every join type and scan type have their time and place.  Some people look for the word “Sequential” scan and immediately jump back in fear, not considering whether it would be worthwhile to access data another.  Take, for example, a table with 2 rows -- it would not make sense to the query planner to scan the index, then go back and retrieve data from the disk when it could just quickly scan the table and pull data out without touching the index.  In this case, and in the case of most other small-ish tables, it would be more efficient to do a sequential scan.  To quickly review the join and scan types that PostgreSQL works with:

  • Scan Types
    • Sequential Scan
      • Basically a brute-force retrieval from disk
      • Scans the whole table
      • Fast for small tables
    • Index Scan
      • Scan all/some rows in an index; look up rows in heap
      • Causes random seek, which can be costly for old-school spindle-based disks
      • Faster than a Sequential Scan when extracting a small number of rows for large tables
    • Index Only Scan
      • Scan all/some rows in index
      • No need to lookup rows in the table because the values we want are already stored in the index itself
    • Bitmap Heap Scan
      • Scan index, building a bitmap of pages to visit
      • Then, look up only relevant pages in the table for desired rows
  • Join Types
    • Nested Loops
      • For each row in the outer table, scan for matching rows in the inner table
      • Fast to start, best for small tables
    • Merge Join
      • Zipper-operation on _sorted_ data sets
      • Good for large tables
      • High startup cost if an additional sort is required
    • Hash Join
      • Build hash of inner table values, scan outer table for matches
      • Only usable for equality conditions
      • High startup cost, but fast execution

As we can see, every scan type and join type has its place.  What’s most important is that the query planner has good statistics to work with, as mentioned earlier.

We’ve only talked about one instance where EXPLAIN helped identify a problem and give an idea of how to solve it.  At EDB Support, we’ve seen many situations where EXPLAIN could help identify things like:

  • Inaccurate statistics leading to poor join/scan choices
  • Maintenance activity (VACUUM and ANALYZE) not aggressive enough
  • Corrupted indexes requiring a REINDEX
  • Index definition v. query mismatch
  • work_mem being set too low, preventing in-memory sorts and joins
  • Poor performance due to join order listing when writing a query
  • Improper ORM configuration

EXPLAIN is certainly one of the most invaluable tools for anyone working with PostgreSQL, and using it well will save you lots of time!

 

Join Postgres Pulse Live!

We make use of the problems we solve and the conversations we have in helping people with Postgres, and this was another example of that effort in motion.  EXPLAIN and the query planner doesn’t start and stop with what we’ve outlined here, so if you have other questions, we’re here for you.  You can find all of our blog and YouTube series here, and you can always join us for our next session.

Join us on Monday, May 4th, for our next Pulse Live Session!  We’ll dive into this week’s questions and quagmires around EXPLAIN use, as well as take questions from anyone who participates.  You can ask your questions via email at postgrespulse@enterprisedb.com, hashtag on Twitter, or live during the event right here.

 

Richard Yen

Richard is a Senior Support Engineer at EnterpriseDB and supports the entire suite of EnterpriseDB's products. Prior to joining EnterpriseDB, Richard worked as a database engineer and web developer, functioning primarily in operations with a focus on scalability, performance, and recoverability. He has a broad range of knowledge in a number of technologies, and most recently has been involved in developing tools for rapid-deployment of EDB Postgres Advanced Server in Docker containers.  Richard is an EnterpriseDB Certified PostgreSQL Professional.