Postgres often lacks the sophisticated reporting of more established enterprise relational database systems. Sometimes that is because Postgres doesn't require as much monitoring and tuning, but there are legitimate monitoring needs which Postgres doesn't support.
Thanks to the wait_event_type and wait_event columns added to the pg_stat_activity view in Postgres 9.6, it is possible to find which parts of the system are causing query delays. In Postgres 9.6 wait_event_type can have four values:
SELECT pid, wait_event_type, wait_event
pid | wait_event_type | wait_event
14154 | LWLockNamed | WALWriteLock
8923 | |
4155 | LWLockNamed | WALWriteLock
14158 | LWLockNamed | WALWriteLock
14160 | LWLockNamed | WALWriteLock
14161 | Lock | transactionid
14163 | LWLockNamed | WALWriteLock
14165 | LWLockNamed | WALWriteLock
14166 | LWLockNamed | WALWriteLock
14167 | Lock | tuple
Postgres 10 will have even more monitoring capabilities, with at least four new wait event types. One nice thing about this feature is that it has almost no overhead so is enabled by default. Unfortunately, there is no efficient way to measure event duration except by periodic sampling of event states.
This excellent email post from Robert Haas statistically analyzes wait events to show how different queries have different wait behaviors, e.g. heavy wal writes, unlogged tables, all data in shared buffers. The results match what you would expect from these kinds of workloads, but it is much more detailed than you would get from guessing.
These new pg_stat_activity columns give us a new window into performance. Before this, we had to guess what the bottleneck was, and we were very good at it. This new instrumentation gives us very accurate statistics on where we are losing performance. I expect this feature to yield major performance improvements in the years to come.
Bruce Momjian is Senior Database Architect at EnterpriseDB.
This post originally appeared on Bruce's personal blog.