This week, a customer I was working with wanted help setting wal_keep_segments—the number of extra transaction log files to keep on the primary specifically so that standbys could afford to fall behind when using streaming replication. The customer already had it set quite high, but occasionally it wasn't high enough, as their standby would eventually fall behind.
I suggested using replication slots, which would eliminate the need to use a hardcoded amount of WAL files, as it would allow the standby to tell the primary how far behind it was and therefore get the primary to keep WAL files around until the standby had consumed them. However, the customer was concerned that the primary may potentially run out of disk space if the standby became unavailable for an extended amount of time. Their priority was that the primary must not stop working, but they also wanted the standby to eventually catch up.
This is where the trade-offs have to be evaluated.
When setting up Postgresql database replication, you will no doubt need to make various decisions about persistence of data:
- How many standbys to maintain?
- What strategy you want to use with regards to WAL management?
What is WAL (Write-Ahead Log)?
WAL (write-ahead log) is the log of changes made to the database cluster which is replayed either as part of the database recovery process when a database isn't shutdown correctly (such as when a crash occurs), or is used by standbys to replay the changes to replicate the database.
So what did I mean by WAL management? Well, you will need to consider what happens to WAL when the standby goes down for whatever reason (server crash, network failure etc.) There are three options to consider, which fall into two buckets: bounded WAL and unbounded WAL.
This is where we want to save the primary at all costs, even if it means sacrificing our standby(s). More specifically, we want to prevent the WAL from filling up all disk space on the primary, which would otherwise cause the primary to fail.
In this scenario, the amount of WAL is constrained by wal_keep_segments as configured in postgresql.conf.
The problem with this strategy is that we don't know how much WAL we will need. It involves guessing what the peak WAL traffic will be, and coming up with a number that will allow standbys to catch up if they fall a certain amount behind. However, if this proves to not be enough, the standby will fall too far behind to catch up. This strategy also means we are keeping a lot of WAL around which isn't needed most of the time.
Note that you would ideally still be archiving WAL when using wal_keep_segments because it would allow for Point-In-Time Recovery, which is only achievable with a WAL archive, which then makes this an unbounded solution. So the only way to constrain the amount of WAL files is to not archive WAL, meaning that there is no way to perform Point-In-Time Recovery.
Here, we care about ensuring all data is replicated, even if it means that, in the worst case scenario, we cause the primary to go down. There are two strategies which don't constrain the amount of WAL being kept around.
When WAL is generated, it is also (or at least should be) archived. This archiving process effectively copies the WAL files elsewhere so that the primary doesn't have to worry about keeping them around anymore, and the standby can grab WAL from the archive if it's missing from the primary. The benefit to doing this is that they can be used in combination with a base backup, and the archive can be cleared of WAL older than the base backup.
However, the disk location of the archive is finite, so must be ample enough to cater for the most WAL you could ever expect to come from the primary. If the archive ends up filling up the disk, the archive command will fail on the primary, which will prevent WAL from being recycled, resulting in a build-up of WAL at the primary location (or if on a separate disk, that location). When that build-up ends up filling up storage space, the primary will fail as it will not be able to write any more WAL to disk. This case can be made less likely by compressing the WAL files as part of the archiving process, and decompressed by the standby's restoration process.
Replication slots are available in all supported versions of PostgreSQL and are a way for standbys to communicate with the primary so that it knows how much WAL the standby has consumed.
When using replication slots, you don't have to worry about wal_keep_segments, hot_standby_feedback and vacuum_defer_cleanup_age. If the network goes down, the primary will know the standby hasn't got newer WAL, and will hold onto them until it comes back. Of course this means if it's decided that the standby should be decommissioned, it would be imperative that the corresponding replication slot is dropped from the primary, or the primary will hold onto WAL until disk space has been exhausted. Disk space could also be exhausted if the network was to go down for too long also, but this would likely be exceptional in most cases.
It is worth mentioning that in PostgreSQL 13, the setting max_slot_wal_keep_size will be available, which allows for limiting how much WAL is kept back by replication slots, so this can be used to prevent the disk from filling up if necessary. However, as with wal_keep_segments, if this value is exceeded, the standby will fall behind beyond the point of catching up, so would remain stalled at the same point.
As with wal_keep_segments, you would ideally still be archiving WAL when using replication slots because it would allow for Point-In-Time Recovery, which is only achievable with a WAL archive.
Pros and Cons
As you can see, each strategy has its benefits and trade-offs. Here is a summary of each one:
- Limits how much space WAL can occupy at the primary location.
- The primary will never go down due to WAL filling up storage.
- In order to limit WAL, it requires forgoing WAL archival.
- The standby can fall too far behind meaning a new base backup would be needed.
- Far more WAL will be kept around than will be needed most of the time.
- Requires estimations of the worst-case scenario that will be tolerated.
- Allows for a separate space for WAL.
- The primary doesn't need to keep WAL beyond that needed for recovery and standbys.
- Can use the archive to catch up.
- Disk usage can be reduced by using compressed files.
- The archive disk location can fill up, meaning the primary will start filling up too.
- The primary doesn't need to keep any extra WAL that hasn't been consumed by a standby.
- wal_keep_segments, hot_standby_feedback and vacuum_defer_cleanup_age don't need configuring, so no guesswork involved.
- Worst case scenario of standby being unavailable to the point of primary failing due to WAL accumulation is very unlikely.
- If a standby fails to keep up, or becomes unavailable, WAL can build up on the primary.
As you can see, there are a few factors to consider, each with its own trade-off. My personal recommendation would be to forget about wal_keep_segments and use replication slots and a WAL archive. This way, you can keep WAL files to a minimum, and can perform Point-In-Time Recovery. The concerns about running out of disk space can be mitigated by monitoring both the space available at the location where WAL is written, and the WAL archive location. If you set a threshold that gives you time to sort out more space for the archive, or get a standby back online, you should experience no downtime anywhere, and you don't lose any functionality. If you keep your WAL on an LVM volume, you can buy time by expanding it for temporary provisioning of extra WAL, and reduce it again once the underlying cause of excessive WAL has been resolved.
Alternatively, you can employ software that can help you manage your WAL file strategy such as EDB’s BART or open source projects such as wal-e, which performs compressed WAL backup to cloud storage.
Join Postgres Pulse Live!
Our goal is to continue to leverage our experiences with our customers to bring meaningful insights for all PostgreSQL users, both in the problems themselves as well as in the approach we take to the solution. Looking for more help just like this? Here you can find all things Postgres Pulse, including all of our blog posts and our YouTube series.
Join us on Monday, June 1st, for our next Pulse Live Session! We’ll be bringing the usual mash-up of questions and opinions related to this week’s topic, but we’re open to any questions you may have around PostgreSQL at all. You can ask your questions via email at email@example.com, hashtag on Twitter, or live during the event right here.
Thom is a Principal Systems Engineer at EnterpriseDB, providing advanced support, training, consultancy and comprehensive system assessments to produce customised architectural health check reports. In addition to this, Thom is also heavily involved in the PostgreSQL community answering questions relating to feature use, configuration and performance, testing new feature patches, getting involved in development discussions, producing patches for PgAdmin (popular open source PostgreSQL front-end GUI) and helping organise the main European PostgreSQL conferences.