This blog will continue the discussion of parallel query in PostgreSQL. In the previous blog of this series, we learned about parallel index scans, its design in PostgreSQL and the performance improvement achieved for a few queries on the industrial benchmark of TPC-H. Therein we analyzed the performance improvement only for a small factor of 20 (database size was approximately 20GB). But the performance benefits realized by parallel operators aren’t that significant until we leverage them for higher scale factors. Hence, in this blog, we will analyze the performance of parallel index scans on 300 scale factor.
For this experiment we used TPC-H inspired benchmark for PostgreSQL. We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate, o_orderdate, o_comment). We tuned following parameters,
- random_page_cost = 0.1
- seq_page_cost = 0.1
- effective_cache_size = 10GB
- shared_buffers = 10GB
- work_mem = 1GB
In the table below, we compared the performance of the queries using parallel-index scans on v10 to their performance in v9.6. Note that in v9.6 parallel seq scans and parallel nested loop joins were available. Additionally, we tabulated the total contribution of parallel index-scan for each of the queries, it is the execution time of parallel index scan node in explain analyze output of the respective query. All the values of timing are in seconds.
TPC-H query |
On v9.6 |
On v10 |
Contribution of PIS |
6 |
883 |
155 |
137 |
12 |
740 |
717 |
122 |
14 |
422 |
111 |
53 |
15 |
2295 |
1138 |
157 |
17 |
17724 |
3376 |
16 |
From the above table, it is clear that the benefits of parallel index scans are significant on large amounts of data as well. To get a better understanding of how this scan method benefits the queries in question, we studied their query plans on both versions. On analyzing those query plans we found two primary explanations for the benefits attained:
- Contribution of parallel index scan in total execution time of the query: If the most time-consuming operation in a query is index scan then parallelizing it is likely to improve query performance significantly, e.g. Q6 and Q12. To be more precise have a look at the query plan of Q6,
Q6 Query plan on v9.6
———————————
Limit
-> Aggregate
-> Bitmap Heap Scan on lineitem
-> Bitmap Index Scan on idx_lineitem_shipdate
Execution time: 882812.376 ms
Q6 Query plan on v10
———————————
Limit
-> Finalize Aggregate
-> Gather
-> Partial Aggregate
-> Parallel Index Scan using idx_lineitem_shipdate on lineitem
Execution Time: 155619.579 ms
Here, scan is the only time-consuming operator, hence, parallelizing it improves the performance of the query. Similarly for Q12, the index scan is on lineitem which is the largest table of database. Once it is scanned in parallel, the join above it could also be done in parallel and hence the improvement in performance.
On the other hand, the benefits are not so pronounced when the plan is more complex and involves a number of operators with index scan as one of the less significant operator in terms of total execution time of the query.
- Operators above it could also leverage parallelism: In some of the queries, it has been observed that even when the contribution of parallel index scan is not much in total execution time of the query, the benefits are significant, e.g. Q15 and Q17. In such cases, the benefits are not coming from parallel index scans alone but from the fact that now more operators could be pushed to workers and hence extend parallelism upto higher levels. To explain this observation, query plan for Q15 is used:
Q15 Query plan on v9.6
—————————————
Limit
InitPlan 1 (returns $0)
…
-> Merge Join
Merge Cond: (lineitem.l_suppkey = supplier.s_suppkey)
-> GroupAggregate
Group Key: lineitem.l_suppkey
-> Sort
-> Bitmap Heap Scan on lineitem
-> Bitmap Index Scan on idx_l_shipdate
-> Index Scan using supplier_pkey on supplier
Execution time: 2297222.717 ms
Q15 Query plan on v10
—————————————
Limit
InitPlan 1 (returns $0)
…
-> Nested Loop
-> Finalize GroupAggregate
Group Key: lineitem.l_suppkey
-> Gather Merge
-> Sort
-> Partial HashAggregate
Group Key: lineitem.l_suppkey
-> Parallel Index Scan using idx_lineitem_shipdate on Lineitem
-> Index Scan using supplier_pkey on supplier
Execution Time: 1133723.985 ms
In this case, with the introduction of parallel index scan the plan structure changed altogether. With parallel index scan, the aggregates are pushed to workers and the join method is also changed, eventually improving query performance. This shows that it may not be necessary for parallel index scan to be the most time consuming operator for it to improve query performance, rather having it may help other operators to use parallelism which could not be done otherwise.
Overall, we saw there are significant benefits of parallel index scans on large volume of data as well. Thus, aiming towards a more OLAP compatible and scalable version of PostgreSQL. Stay tuned to know about more groundbreaking features in PostgreSQL.