How to Monitor Postgres Like a Pro!

How to Monitor Postgres Like a Pro!

Monitoring is a critical element of the database ecosystem and for overall performance of the database. It also helps ensure that your database is in a healthy state and contributing to the long term stability of your database and application.

Monitoring becomes tricky without knowing what to monitor and how to monitor those variables.

 

What you will learn in this webinar:

  • Why is it essential to monitor Postgres?
  • The proactive and reactive monitoring approaches and how they help handle future problems
  • The top 10 monitoring checklist
  • When not to monitor
  • A deeper dive into open source tools for monitoring with Prometheus and Grafana

 

Webinar video:

 

Webinar slides:

 

Webinar Q&A:

 

  • Please insert XID Wraparound Monitor Query asked for during webinar

   Query:1 on the database level

   SELECT datname

   , age(datfrozenxid)

   , current_setting('autovacuum_freeze_max_age')

   FROM pg_database

   ORDER BY 2 DESC;

 

   Query-2: On table level

 

   SELECT c.oid::regclass

    , age(c.relfrozenxid)

    , current_setting('autovacuum_freeze_max_age')

    , pg_size_pretty(pg_total_relation_size(c.oid))

    FROM pg_class c

    JOIN pg_namespace n on c.relnamespace = n.oid

    WHERE relkind IN ('r', 't', 'm')

    AND n.nspname NOT IN ('pg_toast')

    ORDER BY 1 DESC;

 

  • Given that I may not have the opportunity to try all monitoring tools to know which is best for my environment, can you help indicate which is best for performance....which is best for availability..and which is best done security monitoring?
    • Bruce: That is a hard question to answer.
    • Ajay: Specifying one tool will be really challenging for one area. We can suggest one open source and also PEM which covers all the areas. 

 

  • How are long running queries a problem for the DBMS?
    • Bruce: They prevent vacuum cleanup for the database that has the open query.
    • Ajay : Also: long running queries will hold system resources. If a lot of queries are running more than intended time, there will be less resources available for your next database operations.

 

  • Do you have any recommendations or comments/opinions on using Postgres in a containerized platform, such as K8s or plain ol' Docker?
    • Bruce: Sure, it works great.

 

  • How can I proactively monitor database corruption?
    • Bruce: Uh, I would look for server and kernel messages that indicate storage errors --- that is the 99% cause of corruption.
    • Ajay: There are few practices we follow:
      • A. Making sure we are capturing Errors from postgres logs, whenever a block is not readable or corrupted the information gets logged to Postgres logs.
      • B. for catalog corruption use pg_catcheck
      • C. Backup validation (restore the database backup to a test machine), and do a logical backup (pg_dump) of the test backup. I won’t recommend directly doing on production.

 

  • Does EDB Advanced server have a feature similar to Oracle SQL Baseline?
    • No, it doesn't

 

  • Can we use PEM to monitor community PostgreSQL?
    • Bruce: Yes.

 

  • How is PEM licensed?
    • Bruce: PEM is a proprietary product that can be used with a support contract.

 

  • Does Postgresql have incremental backups?
    • Bruce: Some backup tools like BART support incremental block-level backup.  Pgbackrest supports 1GB file level incremental backup.

 

  • Regarding large number of wal files, is there any reason they can be slower to clear (in addition to large number)
    • Bruce: No.

 

  • Question for Bruce -- what kernel logs do you review? What do you look for as needing attention?
    • Bruce: I basically run dmesg every 15 minutes and look for changes, and then white-list the messages I know are safe. Anything new appearing generates an email to me.

 

  • Is there any way to know which tables are most accessed in a database?
    • Bruce: You can look at file system modification times.  I don’t think there are DB-level access activity logs.
    • Ajay: Below query can be a good start to understanding how many times an object may have accessed.

                SELECT schemaname, relname,

                          COALESCE(seq_scan, 0) + COALESCE(idx_scan, 0) nr_accesses

                FROM pg_stat_all_tables

                ORDER BY 3 DESC NULLS LAST;

 

  • How can we know which disk blocks are most accessed in a PG cluster/database? What's the best way to detect hot blocks in a DB?
    • Bruce: Pgbuffercache will show you the blocks that are in the cache.  Why do you want to know what blocks are most accessed?

 

  • Is there any performance advisor log where it starts saying future problems well ahead?
    • Bruce: PEM offers this, and it might be the only tool that does that for Postgres.

 

  • Is there any performance advisor log where it starts saying future problems well ahead as part of performance monitoring (proactive mode) which notify before problem occurs?
    • Bruce: PEM offers this, and it might be the only tool that does that for Postgres.

 

  • Is there any tools/script (like AWR report in Oracle) for EDB Postgres?
    • PEM provides multiple reports that customers use to achieve similar results. There is no direct equivalent

 

  • What's the best way to monitor EDB Postgres on AWS/Azure Cloud?
    • Try EDB Postgres Enterprise Manager. It works very well on EC2 and Azure

 

  • We are using PEM and see lots of alerts, if you would point those out which one, we will set up as per DBA eagle eye monitoring n application performance. 2nd From PEM how to get reports from past alerts?
    • I suggest you contact our support team to help you with the details. You may also want to get our Professional Service team involved.

 

  • What is the difference between PEM and other tools like Grafana?
    • Grafana is a visualization tool; it does not provide probes, alerts or pre-built dashboards
    • Ajay: Grafana has to be used with other monitoring tools like Prometheus or nagios core. It needs a datasource to display graphs/metrics

 

  • What certification is available which is similar to the Oracle SQL 1Z0-071 Certification? How would I be able to get this certification?
    • Ajay :  we can point towards EDB certifications. 

 

  • Does PEM provide email alerts? 
    • Bruce: Yes.

 

  • Can this tool be configured to send email messages for alerts or write to a log file we can monitor from another tool to send us alerts and generate incident tickets?
    • Postgres Enterprise Manager does that

 

  • Object size are monitored? Previous size and/or current size? 
    • Ajay: As far as I understand, users would like to know if we can compare the object size with previous size and say it’s been growing. PEM can provide this through custom alert and by seeing historical data we can see the trend of object growth.

 

  • Is there built-in or recommended tools to automate monitoring DB instance in cluster? 
    • Postgres Enterprise Manager does that

 

  • What would you recommend for monitoring small business size, ie single server, 10GB or less db?
    • Bruce: Probably Nagios and check_postgres.
    • Ajay: prometheus+grafana is really easy to install/configure

 

  • Does this check_postgres.pl work with Nagios fork Icinga? 
    • Ajay: we haven't used or tested it so far.
    • For reading reference: https://en.wikipedia.org/wiki/Icinga

 

  • Does PEM store this information, if PEM does not store then where we can find that information in the EDB?
    • Bruce: What information?

 

  • Why Bruce not wearing a bowtie, no Bowtie 2020?
    • Bruce: Uh, LOL, I don’t wear it at home every day, surprise, surprise.  ;-)  I got complaints at church this past Sunday because I didn’t wear a bowtie, so I guess I have to get back on my game!  ;-)

 

  • Do you think pgsentinel could be useful for query monitoring? Is it supported on EDB?
    • Ajay: we haven’t tried it. 

 

  • Do you have a Work load manager in EDB Postgres?
    • Ajay : No. The priority of the task can be set on the system side. The database doesn’t have this functionality. 

 

  • If you use bart for backup/recovery, can it write to two places (local & bart) at the same time?
    • Not to my knowledge

 

  • Does PEM help establish baseline performance metrics by itself?
    • No, none of the tool will do that. Each environment is very unique and application requirements are very different. Baselining needs human intelligence.

 

  • What is the largest size of postgres database that is in service in the market? Is Postgress good for OLTP systems?
    • Bruce: Uh, maybe 150TB.  PG does OLTP and OLAP just fine, and NoSQL too.

 

  • How to effectively monitor the pg_default tablespace size monitoring? Most of the time, the files present in pg_default tablespace having oid as names not return the object name.
    • Bruce: You can lookup the names from the system catalogs in pg_class.relfilenode, or using oid2name from the command-line.  Also, there are many *size* functions in SQL;  in psql, try \df *size*

 

  • What are the functions to use for checking streaming replication status?
    • SELECT pg_last_xlog_receive_location() AS receive, pg_last_xlog_replay_location() AS replay;

 

  • Is it something different than using below- pg_stat_replication in primary and pg_stat_wal_receiver in standby?
    • Different ways to monitor lag. Here is the query which I use

SELECT pg_last_xlog_receive_location() AS receive, pg_last_xlog_replay_location() AS replay;

 

  • What do you recommend for monitoring BDR lag? 
    • Ajay: I am not sure. 

 

  • By any chance PostgreSQL monitoring available in third party tools like SolarWinds, AppDynamics, etc.?
    • Ajay: Solarwinds does monitor Postgres. Appdynamics is primarily an application performance monitoring tool

 

  • Is there is a way to find that a particular index being used?
    • Bruce: Sure, you can look at pg_stat_user_indexes and look at the usage count.  If it is low, the index is not being used.

 

  • How to quiesce the Postgres cluster without using pgpool or any other tool for other nodes to be in sync for a successful switchover?
    • Enterprisedb Failover Manager(EFM) can be used to do switchover.

 

  • What important information/details can you get through EDB better and why; that you can not get using Prometheus integrated with my application + Grafana?
    • PEM is specifically designed to fulfill database need. It has features like capacity planning,reporting, advisor, and performance diagnostics which other open source tools don’t provide out of the box.

 

  • What is your opinion about enabling checksums at initdb and the overhead it generates?
    • Bruce: The overhead is minimal and the increased corruption detection is valuable.

 

  • Concerning clustering as a performance overhead, do we not need to perform this as a maintenance task?
    • Bruce: People don’t use the CLUSTER command very often.

 

  • How do you monitor the virtual env. beneath the OS of the postgreSQL system. We had a situation where there was no monitor event, but the virt. env was broken?
    • Bruce: I don’t think that is possible from PG.  You need to use lower-level tools, like the kernel logs.
    • Ajay: You need to work with your system admin and ask to access information about physical host and it’s health.