Bruce Momjian recently did a blog on Postgres Monitoring tools. I want to point out a unique and useful feature of Postgres Enterprise Manager from EnterpriseDB.
I frequently hear from prospects that they would like us to help them debug periodic performance drop-off their PostgreSQL server is encountering. When we get called, usually the opening description is something like this:
“Yesterday between 3 and 4 pm my users all started complaining that the system was incredibly slow. At roughly 3:45 pm a few disgruntled users showed up at my door with pitchforks. Thank heavens performance reverted to normal levels around 4:05 pm.”
The good news is that performance is now back to an acceptable level. The bad news is nobody has an inkling why performance fell off a cliff. In addition, the DBA or whoever is deemed to be responsible for making sure this sort of thing does not happen, is now living in fear of the next comparable event on the database server and the pitchfork wielding users will return in greater numbers and with even less patience. Yes, some PostgreSQL and Linux experts can go in and evaluate a system and gain a few clues and periodically make a reasonable diagnosis about why database performance suffered. However, what happens more often than not is that a number of probes are installed on the system to gather more data for the purpose of doing a better analysis of the next performance drop off. Getting to the root cause of such spikes can take many iterations of installing probes and subsequently analyzing the collected data. To impede progress further, some where in the days or weeks that this cycle may require to debug the issue, a second server starts encountering sporadic performance problems. Now the resources to resolve the issues are spread thinner and oh dear, what if they are two entirely unrelated problems!
The source of such performance problems may be a variety of things, here are some examples:
1) Postgres checkpoint settings are not tuned properly
2) The operating system starts swapping as a result of some activity the database is doing.
3) Many users coming on line at roughly the same time and start using the application because the CEO sent out a corporate wide e-mail telling people to take action on something.
4) A report is being run against the database periodically
5) Vacuum settings are not properly tuned.
6) Backup processes or other external processes running on the system.
7) The database has stopped doing index scans on a large table and is now doing sequential scans
To determine which things from the above list is the problem or to determine which of the many other possibilities that may be happening, database and system probe data is required. Further, graphical analysis of this data may be required to gain understanding of the problem.
If Postgres Enterprise Manager (PEM) is installed some place on the network, and a PEM agent running on the data server where the performance drop off occurred you have everything you need to diagnose and trouble shoot the problem. You don’t need to wait for the problem to repeat itself and hope you are able to gather enough data.
1) You have the database and system probe data.
2) You have the graphical tools to find the smoking gun in the data that will lead you to the root cause of the problem.
The specific tool within Enterprise Manager that will guide you to the problem is called the Capacity Manager. It repeatedly gathers statistics about what is happening on the system to allow you to forecast “what will happen on the system”. However, what is often more interesting is looking at the statistics to find out “what the heck did happen” on the system.
In the coming weeks I will be creating several additional online demos for Postgres Enterprise Manager. The first of which will be how to use Capacity Manager to debug a sporadic dip in performance.
So please stay tuned…