EDB offers an advanced database management system built on PostgreSQL that can be adapted to a wide range of use cases. To help users understand how different EDB reference architectures can apply to some common use cases, EDB’s VP & Chief Architect of Database Infrastructure Dave Page collaborated with Chief Performance Architect Vibhor Kumar to deliver this recent webinar that highlighted a few basic architectures: single node, multi-node with asynchronous replication, and multi-node with synchronous replication.
The popular topic included a robust Q&A session at the end of the webinar, including the following common questions that weren’t answered live due to time constraints.
Replication and Availability
Can we use Postgres Enterprise Manager (PEM) to generate availability and usage reports in a cluster or client/server environment?
Postgres Enterprise Manager doesn't have the specific functionality to generate such reports natively. However, you can easily create custom charts and dashboards to monitor the metrics collected by Postgres Enterprise Manager and capitalize on the Capacity Manager tool to produce ad-hoc reports. The PEM server collects an extensive array of usage metrics (from session information, CPU usage, IO and so on), and keeps track of alert history (including basic availability).
In a multi-node architecture with asynchronous replication, will unreplicated transactions be lost during failover?
In an asynchronous cluster, the primary node will report that a transaction has been committed as soon as it is safely written to non-volatile storage on the primary node. After reporting that commit, the node will then replicate the changes to each of the standby nodes. If a failure of the primary node occurs before the standbys have received the changes, they may be lost.
If you need to guarantee that transactions will not be lost, then you should consider a synchronous replication architecture. However, that guarantee comes at a substantial performance tradeoff. Since the primary node has to wait for one or more standbys to report back that the transaction has been written to non-volatile storage before committing it to the client, the client will experience a significant performance penalty, potentially outside the realm of what’s acceptable for your application.
Can we architect high availability with an auto failover capability?
Yes. EDB Failover Manager (EFM) is used in EDB architectures to monitor each node and manage auto failover and manual switchover. Typically, EFM would be used within a co-located cluster (i.e. either physically located in the same data center or housed in different availability zones within a cloud region); however, it can also be used with clusters spread across larger geographies.
When EFM determines that a Primary node has gone down (e.g. in the event of a software or hardware failure), it will kick-off auto failover to a Standby node and re-assign virtual IP addresses as necessary. If the network configuration has each node in a different subnet, however, you’ll need to create a workaround, for example by using the hooks EFM provides to manage an elastic IP. Our documentation provides detailed guidance on how to use EDB Failover Manager.
Can EDB Postgres support cascading and delayed replication topologies?
Yes, EDB Postgres supports both. Cascading replication can be used to reduce the number of direct connections to the master, as well as to minimize inter-site bandwidth overheads. Currently, cascading replication is asynchronous only, and does not support synchronous replication between standbys. More information about both cascading and delayed replication can be found in our documentation.
Is it possible to create an Active-Active cluster with multiple read/write nodes?
There are actually multiple ways to create an Active-Active cluster, each with their own advantages and trade-offs. Today, you can use EDB Postgres Distributed, which supports multi-master replication.
How do we achieve high availability for pgPool or pgBouncer?
For architectures that co-locate pgPool and/or pgBouncer on the database servers, EDB Failover Manager will monitor and manage the virtual IP addresses for the database server cluster through the poolers, thereby monitoring the poolers as well.
In architectures where your servers are in a public cloud, you can also leverage the cloud’s auto-scaling capabilities. For example, Amazon Web Services (AWS) supports EC2 Autoscaling which offers fault-tolerant and availability benefits which can be utilized by pgPool and pgBouncer.
When using pgBouncer for load balancing, can it support transparent failover of connections when the underlying node fails?
When used in conjunction with EDB Failover Manager, pgBouncer does support failover when an instance fails. However, since in-flight operations will be aborted, it is not completely transparent.
Write-Scaling and Distributed Database Design
If an application is write-intensive, how do we scale writes?
Scaling of writes is not possible in a fully replicated cluster because ultimately all writes must be written to all nodes in the cluster. In order to achieve write scaling, you need to either vertically scale the hardware (i.e. buy a bigger box!), or use a distributed database design, perhaps sharding data across multiple servers or clusters. This typically requires the application to have an understanding of where different shards are located.
How many virtual CPUs are required in Google Cloud?
You can run a cluster with as little as 1 vCPU per node. However, different workloads have different vCPU requirements for optimal performance, so that’s something you’ll need to test with your application.
Backup and Recovery
Are separate dedicated servers for Postgres Enterprise Manager and Backup and Recovery Tool (BART) needed, or can these two functionalities co-exist in a single machine?
PEM and BART can be co-located on a single machine, if that machine is powerful enough to support your workload. You can also run BART on the individual database servers if you prefer, however, that’s harder to manage at scale.
Can BART backups be performed on standby database servers as well?
Yes, BART can backup standby servers as well as the primary node(s).
Does BART support incremental backups?
Yes, BART can perform block-level incremental backups on the primary node(s). Incremental backups are not supported for standby servers, and if a standby node has been promoted to primary, a full backup must be taken before enabling incremental backup on the cluster.
Can we use other backup products such as Comvault or Rubric with our PostgreSQL database?
Commvault does support PostgreSQL; however, EDB cannot comment on the quality of the tool and its Postgres backup functionality. We do not believe that Rubrik currently supports Postgres.
Does EDB have a “flashback” feature wherein we can dictate a particular restore point?
While a particular restore point can’t be determined in advance, you can use Point In Time Recovery (PITR) to restore the database cluster to a particular point in time from the backup. You can also delay replication standby, which can help you recover any data lost due to human error.
Administration and Configuration
Is the Postgres Enterprise Manager server a data collection node?
The PEM server includes multiple components, including a web server, an agent that performs various operations (such as spooling alerts to email, SNMP etc), and a database server for either PostgreSQL or EDB Postgres Advanced Server in which the system configuration and monitoring data is stored. On most installations, these components are co-located on a single machine; however, you can choose to run the web server on one machine, and the agent and database on another. Our Postgres Enterprise Manager documentation has more details on the architecture.
How does Postgres Enterprise Manager store Postgres logs (e.g. as table data?)? How long can they be retained? Is this an audit function?
PEM can be configured to collect server logs from either PostgreSQL or EDB Postgres Advanced Server (EPAS), as well as the audit log from EPAS. The logs will be scanned on a schedule you choose (e.g. every 10 minutes), and imported into a table in the database used by PEM. The records can be retained for whatever period of time you choose and can be used for auditing and diagnostics purposes using tools/dashboards in PEM.
Can SQL profiling be used to focus the SQL load of a single application user in a production environment?
Profiling can be done on-demand, or be scheduled using the Postgres Enterprise Manager. If desired, each profiling run can be set up to target specific databases and/or users.
To learn more, watch our on-demand webinar, An overview of reference architectures for Postgres!