The 18th Southern California Linux Expo (SCaLE) took place in sunny Pasadena California in March, which included a 2-day series of talks by members of the PostgreSQL community. At the event, I gave a talk titled, Scaling the Wall of Text: Logging Best Practices in PostgreSQL (no pun intended).
My session covered some alarming statistics I noticed while analyzing user habits when it comes to logging PostgreSQL activities. For instance, nearly half of users don't log more than a timestamp, thereby missing out on many of the gems of information that PostgreSQL has to offer through its logging features.
In this post, we'll cover some more of the highlights of my talk at sCaLE.
Why is Logging Important?
The log is what enables DBAs, sysadmins, and developers to identify problems and resolve them. While the log shows activity within the database, it is possible with some inferences to identify problems with applications or infrastructure components (such as an extraneous transaction commit/rollback or a misconfigured password). Having the database logs handy, especially in production SaaS environments, will allow administrators to identify root cause, remedy, and resolve issues faster, thereby boosting uptime and user satisfaction. The database log also helps with anticipating required future maintenance tasks (such as a wraparound-preventing vacuum).
As a guiding principle, PostgreSQL users should try to log as much as possible, for as long as possible, so long as the applications served are not negatively affected. Some may be concerned that more logging may lead to more text to parse through and filter, but with proper configuration as described below, it is possible to log much without being affected too much.
Where to Start When Database Logging
The single most useful logging parameter to tune when it comes to the postgresql.conf file is log_line_prefix. This parameter prepends a fixed string of information to each line in the log, thereby making it easier to sort and filter information, which in turn helps DBAs, sysadmins, and developers identify issues more quickly. While there are many options to put into this prefix string, some of Richard's favorites are:
- %a - Application Name - Allows quick reference and filtering
- %u - User Name - Allows filter by user name
- %d - Database Name - Allows filter by database name
- %r - Remote Host IP/Name (w/ port) - Helps identify suspicious activity from a host
- %p - Process ID - Helps identify specific problematic sessions
- %l - Session/Process Log Line - Helps identify what a session has done
- %v/%x - Transaction IDs - Helps identify what queries a transaction ran
With these parameters, it is possible to search for specific keywords like ERROR, FATAL, WARNING, PANIC, duration, and so on, pinpoint the relevant PIDs or IP addresses, and identify the source of a problem.
Log More to Solve PostgreSQL Issues
While tuning log_line_prefix is the best starting point, there are 25 other parameters that can be leveraged to improve a DBAs chances at solving problems. It may be intimidating to have to tune each of these, so Richard recommends working with the following as a bare minimum:
- log_min_duration_statement -- helpful in identifying slow queries
- log_statement -- good for auditing purposes
- log_[dis]connections -- good for auditing purposes
- log_rotation_age/log_rotation_size -- good for organization
- log_autovacuum_min_duration -- gives insight into autovacuum behavior
- log_checkpoints -- know what queries happened between checkpoints
- log_temp_files -- helps identify work_mem shortages, I/O spikes
These parameters give greater visibility into the activities of the database, especially in identifying things like when a user logged in, what statement last ran before the database crashed, why autovacuum is not removing dead tuples, which queries might be hogging up memory and causing operations to spill to disk, and so on.
As it is quite apparent here, PostgreSQL's logging features have much to offer, and should not be overlooked. For more information about events that EDB attends, and where to attend talks like this one, click here.
Richard is a Senior Support Engineer at EnterpriseDB and supports the entire suite of EnterpriseDB's products. Prior to joining EnterpriseDB, Richard worked as a database engineer and web developer, functioning primarily in operations with a focus on scalability, performance, and recoverability. He has a broad range of knowledge in a number of technologies, and most recently has been involved in developing tools for rapid-deployment of EDB Postgres Advanced Server in Docker containers. Richard is an EnterpriseDB Certified PostgreSQL Professional.