Linux x86-64 (RHEL 8) 14 SQL Profiler
Rafia S Nov 7, 2019
Parallel access methods are introduced in PostgreSQL since v 9.6. Still, I could not help but notice that every now and then there are complaints about the parallel sequential scan that is not getting selected or it is degrading the performance of a query. So, I decided to write this blog to cater to more practical scenarios and specifically focus on its less talked about aspect -- where parallel sequential scan would (should) not improve the performance.
The parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6. The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and colleague of mine Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters.
Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely a leader which coordinates and collects the output of the scan from each of the workers. This worker may or may not participate in scanning the relation depending on its load in dividing and combining processes. End users can also control the involvement of a leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter.
Now, let's understand the concept of a parallel scan in PostgreSQL by a simple example.
- Let there be a table T (a int, b int) containing 100 tuples
- Let's say we have two workers and one leader,
- Cost of scanning one tuple is 10
- Cost of communicating a tuple from worker to leader is 20
- Cost of dividing the tuples among workers is 30
- For simplicity, let's assume that the leader gives 50 tuples to each of the workers
Now, let's analyze if the parallel scan will be faster than non-parallel scan,
Cost of SeqScan = 10*100 = 1000 Cost of Parallel SeqScan = 30 + (50 * 10) + (50 * 20) * 2 = 2530
Here, we can see that though the cost of scanning the tuples is halved yet the cost of combining the total result is enough to make the overall cost of parallel SeqScan higher than non-parallel SeqScan.
Now, let's say we want to list only the tuples which have a > 80, and there are only 20 (say) such tuples, then cost of SeqScan will remain same, but cost of parallel SeqScan can be given as,
Cost of Parallel SeqScan = 30 + (50 * 10) + (10 * 20) * 2 = 730
Hence, parallel SeqScan is likely to improve the performance of queries that require scanning a large amount of data but only a few of them satisfy the selection criteria. To generalize this,
Cost of SeqScan = Cost of scanning one tuple * number of tuples Cost of parallel SeqScan = Cost of dividing the work among workers + cost of combining the work from workers + (cost of work done by a worker * number of workers)
Let's dive into it a bit more,
Cost of dividing the work among workers is fairly constant depending on the relation size
Cost of combining the work from workers = cost of communicating the selected tuples from each worker to the leader Cost of work done by a worker = cost of scanning a tuple * number of tuples the respective worker has received
Now, we can see that the cost of combining the work is dependent on the number of tuples received by each worker. Now, for the queries where all or almost all of the tuples are in the final result, we will pay more cost than its non-parallel flavour, first in scanning the tuple and second in combining it to the final result.
In PostgreSQL, the cost determining the cost of dividing the work among workers is given as parallel_setup_cost, the cost of communicating the tuple from worker to leader is given by parallel_tuple_cost, and the number of workers is upper bounded by the GUC max_parallel_workers_per_gather.
So, if you are using a system high-frequency multiple processors then lowering the parallel_setup_cost and parallel_tuple_cost will help in the selection of parallel scans. If there are not many processes running in parallel, then increasing max_parallel_workers_per_gather can leverage more parallel processes to improve query performance. Another point to note is that the number of workers is further capped by max_worker_processes.