In the previous blog, we saw that parallel index scans leads to significantly improves the performance of quite a few TPC-H queries. It is customary to analyze if its sister operator, namely index-only scan will benefit similarly when parallelized.
Before getting into that, we will briefly discuss the utility of an index-only scan. In PostgreSQL, indexes are stored at a different location than the tables. For a regular index scan on a table, first the index created on that table is scanned to find the relevant leaf nodes and then the table is scanned for those locations only. Now, if there is some query in which we need only the values of columns which have an index, then we can scan the index tree only and return the required data, since there is nothing extra that we need to retrieve from that table, that type of scan is called index-only scans. To be precise, index-only scans are a special type of access method which uses index alone and does not require to fetch data from the heap.
For example, an index-only scan is likely to show a performance improvement over a regular index scan for the query such as, SELECT count(*) FROM countries WHERE country_area <= <some value>. Assuming we have an index on the column country_area. Here, we can get the tuple information lesser than the required country area by index alone, hence, saving the I/O time to read the tables.
The design and implementation of parallel index-only scan is heavily dependent on the machinery developed for scanning B-tree in parallel. There are no new GUC parameters or configuration settings required for this scan.
Performance of parallel index-only scan
For the industry strength benchmark TPC-H on 300 scale factor, the performance of Q13 is improved by almost 5x with the usage of parallel index-only scan. For this experiment, we used TPC-H inspired benchmark for PostgreSQL. We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned the following parameters,
random_page_cost = 0.1 seq_page_cost = 0.1 effective_cache_size = 10GB shared_buffers = 10GB work_mem = 1GB Q13 Query plan on v9.6 ———————————— Limit -> Sort Sort Key: (count(*)) DESC, (count(orders.o_orderkey)) DESC -> HashAggregate Group Key: count(orders.o_orderkey) -> GroupAggregate Group Key: customer.c_custkey -> Merge Left Join Merge Cond: (customer.c_custkey = orders.o_custkey) -> Index Only Scan using customer_pkey on customer -> Index Scan using idx_orders_custkey on orders Execution time: 4146177.735 ms Q13 Query plan on v10 ———————————— Limit -> Sort -> HashAggregate Group Key: count(orders.o_orderkey) -> Finalize GroupAggregate Group Key: customer.c_custkey -> Gather Merge -> Partial GroupAggregate Group Key: customer.c_custkey -> Sort -> Parallel Hash Left Join Hash Cond: (customer.c_custkey = orders.o_custkey) -> Parallel Index Only Scan using customer_pkey on customer -> Parallel Hash -> Parallel Seq Scan on orders Execution Time: 739088.785 ms
We can see that the query involved aggregations and join over the primary key of customer table. When parallel index-only scan is used, both the aggregations as well as the join could be performed in parallel, hence improving the query performance.
I would like to close this discussion with a thank you note to my colleague Amit Kapila who helped and supported me in the due course of this project. Additionally, I would like to thank my employer EnterpriseDB for bestowing me with such an opportunity.