Overview
There is often a need to increase the count of user-base connected to an application mainly due to business requirements due to factors such as business growth and addition of business units etc. The Database Engineers and DBAs are frequently requested to figure out capacity management of database performance for the intended growth. One of the key database performance indicators is transactions per second (TPS) for a given increase in the connection load.
The TPS factor becomes significant in a PostgreSQL database world as every new connection to the database only increases the CPU utilization on the underlying host. The general recommendation for the maximum number of clients concurrently connecting to a database should be no more than 300 unless absolutely required by the application, controlled by a database configuration parameter max_connections, while the default setting of this parameter is 100 and it is usually adjusted based on application needs.
However, due to critical factors of business requirement and growing traffic to an application, there is a challenge to increase the number of concurrent connections to a database beyond its general recommendations, and sometimes it can get as close as 1000 concurrent connections as has been observed in various database environments. The usual method of increasing the connections and adding more compute resources to cope with the load is a reactive measure which is no longer a valid approach for database capacity planning.
The proactive approach is to determine how much throughput (based on TPS) can be achieved by increasing the connections, usually demanded by large scale organizations, can now be visualized based on some performance metrics. This article gives a detailed account of a set of such benchmark tests executed against an EDB Postgres Advanced Server 15 database instance. It clarifies concerns when considering to introduce Pgbouncer as a connection pooler into a database solution design. It gives an elaborate account of whether or not the Pgbouncer should be co-located on the same host as the database or functioning in a dedicated host. The answers to some of the common and ‘run off the mill’ kind of questions such as below can be found based on evidence:
What is the need for a connection pooler ?
When is a connection pooler needed and under what circumstance ?
What level of connection overhead will an EPAS15 database experience without a connection pooler ?
What would be the optimal TPS possible to achieve with current settings ?
How can the TPS be maximized ?
When can there be a degradation of database performance with respect to TPS ?
Benchmark tool and connection pooler
The benchmark tests confirm the inclusion of a connection pooler to attain better performance of a database if the concurrent connections are to be increased to a massive degree. A connection pooler is a light-weight process that can act as a shield for a database against a sudden surge in incoming traffic. The connection pooler utilized in this test is Pgbouncer which can be configured in 3 different modes such as listed below and it consumes very low memory (2kB per connection by default). An extensive explanation of the three modes of operation is beyond the purview of this article, which can be understood from the official documentation here.
-
● Session Pooling
-
● Transaction Pooling
-
● Statement Pooling
The benchmark testing was carried out using pgbench, which is a simple program for running benchmark tests on a PostgreSQL database. It runs a sequence of SQL commands in multiple concurrent database sessions to calculate average transaction rate (transactions per second) and latency etc. The benchmark test load consisted of a default test containing 5 database statements (3 UPDATES, 1 INSERT and 1 SELECT) constituting 80% of WRITES and 20% of READS in every transaction. This is loosely based on the TPC-B kind of Enterprise Benchmark Standard. The pgbench utility gives the ability to build a default database containing the necessary tables with test data, and the scale of database size put to test was approximately 300 GiB in size.
Benchmark setup configuration
The compute and resources of the cloud hosts involved in the test are as below:
Cloud VM Name |
EPAS15 |
Cloud VM Name |
PGB |
|
Description |
Database host |
Description |
Pgbouncer host |
|
Cloud VM Type |
c5.9xlarge |
Cloud VM Type |
c5.xlarge |
|
OS |
RHEL 8.6 |
OS |
RHEL 8.6 |
|
vCPUs |
36 |
vCPUs |
4 |
|
Memory |
72 GiB |
Memory |
8 GiB |
|
Storage size & type |
500 GiB-SSD gp2 |
Storage size & type |
64 GiB-SSD gp2 |
postgresql.conf settings of EPAS15 cluster |
|
shared_buffers |
18GB |
effective_cache |
54GB |
edb_dynatune |
100 |
maintenance_work_mem |
100MB |
max_connections |
1200 |
max_wal_size |
15GB |
min_wal_size |
1GB |
NOTE:
-
● Every sample of the test was taken using a ‘best of three runs’ approach to eliminate any skewness
-
● Every run involved flushing database and OS cache along with database restarts in order to avoid any residual cached data impacting the outcome
-
● Pgbouncer was configured to run in session mode
-
● The number of client connections was gradually increased at every step by 50, starting
from 50 to 1000 concurrent connections
Explanation
The first set of tests analyze the performance outcome of a database (based on TPS) when connections are made from a remote host (host: PGB) running pgbouncer to the database (host: EPAS15). The pgbouncer process was shutdown when evaluating direct connections to the database from the PGB host so that the pgbouncer process does not impact results in any shape or form.
The tests are also evaluated to include connection overhead to paint a complete picture of how connection establishment can affect database performance. Although, a complete elimination of connection overhead is not an ideal situation in a live environment, because there will always be connection overhead, hence the benchmark tests comparing with and without connection overhead.
The first graph exhibits a consistent TPS performance for connections via pgbouncer to the database well above 750 TPS while the TPS offered by direct connections to the database is around 200. The average latency experienced when connecting over pgbouncer is negligible in the beginning but it builds up to slightly above 1 sec. when the number of clients reach closer to 1000, whereas, the latency average experienced with direct connections to the database grows steadily as the number of clients increase until the latency is about 5 secs. for 1000 clients.
The second graph below shows the outcome of the benchmark without connection overhead, which is not a real production application experience but it was performed for comparative reasons. The TPS performance for pgbouncer connections is consistent from the start to end, with negligible fluctuations around 1000 TPS, while the performance without pgbouncer (direct connections) is outstanding for the initial 200 connections but it drops down to consistent levels on par with pgbouncer’s performance. This indicates that if the database connection load is to grow beyond 300 connections, then having a connection pooler like pgbouncer can be beneficial.
The average latency of both pgbouncer bound connections and direct connections grow steadily, closer to 1 sec around the 1000 concurrent connections, yet again, ‘no connection overhead’ is impossible to achieve.
The two graphs above show the ability of a database to complete requests as and when a request comes in, as quickly as possible. It shows the outcome of a database performance when there is a sudden spike in requests, while the subsequent two graphs emphasize the database performance when the requests build up and continue to consume the database over a period of 10 minutes.
The next two graphs show the performance of a database when sessions run long running queries and consume database time. These tests were carried out with and without connection overhead as well, for comparative study. The third graph below shows high TPS performance for connections through pgbouncer, although it drops down to below 600 towards the end but it bounces back closer to 750 for 1000 concurrent clients, whereas the TPS performance is consistently low for direct connections to the database well below 250.
The latency average shows a similar trend of what had been observed in the first graph, wherein, there is a steady growth in latency average for direct connections as the number of connections increase until it reaches a maximum of 5 secs. while average latency of connections with pgbouncer is consistently low and it only goes beyond 1 sec when the concurrent connections intensify from 800 onwards.
The fourth graph shows the result of the same test without connection overhead, which proves that TPS performance with connections via pgbouncer is consistent, although falling short compared to direct database connections. Furthermore, the TPS performance of direct connections is unpredictable with sudden rise and fall of performance, suggesting it is not a practical situation.
The results so far have pointed out the fact that it is beneficial to have pgbouncer as a connection pooler between a database and application. The tests performed thus far had a dedicated host for pgbouncer, but can co-locating pgbouncer with the database on the same host be a good idea ? The answer follows...
The next two graphs show the results of testing TPS outcome for connections over pgbouncer co-located on the same host as the database against a dedicated host for pgbouncer. The ‘no connection overhead’ tests have been skipped deliberately as it is already an established fact that it is no longer worth testing it.
The ensuing results are from tests executed for completing database requests as quickly as possible and for long running and time consuming requests over a period of 10 mins. The first graph shows that running pgbouncer on a dedicated host yields better outcome compared to co-locating pgbouncer on the same host as the database. Although co-located pgbouncer TPS performance is better compared to the first graph of this article (direct connections to database), yet having a dedicated host for pgbouncer gives much higher performance. The latency average of dedicated pgbouncer hosts is less compared to that of co-located pgbouncer host, which increases gradually as the number of concurrent sessions grow.
The final graph shows the same test (with connection overhead) conducted over a period of 10 mins. simulating a workload of long running sessions proving that running pgbouncer on a dedicated server can be advantageous. As the results indicate, pgbouncer on a separate host outperforms the pgbouncer running on the database host by leaps and bounds.
Final Opinion:
If the database is anticipated to receive a large number of connection requests, then it is advised to introduce pgbouncer in front of the database in order to get better TPS performance.
And, excellent results are accomplished when pgbouncer is NOT co-located on the database host and running on a dedicated server.
It is hoped that an elaborative account of utilizing pgbouncer by and large to achieve connection throughput has been presented with the help of this blog but if there are any extreme cases that require assistance then EDB Professional Services team is always happy to help.
Acknowledgement:
Srikanksha Kopparthy: for conducting all the benchmark tests
Performance Engineering and Architecture Team (PEAT) headed by Vibhor Kumar: for the initial guidance such as the way to conduct benchmark tests, how to get the best sample, eliminating and minimizing the factors that can impact results etc.