Effective PostgreSQL Monitoring: PG_STAT

Dave Page

July 25, 2023

Introduction

Monitoring and alerting on various aspects of your PostgreSQL database is crucial for ensuring optimal performance and locating any bottlenecks. In this blog, we explore the views pg_stat_all_tables and pg_stat_all_indexes that are used to monitor index usage and table statistics. We can pinpoint areas for optimization and raise the general performance of our PostgreSQL database by effectively monitoring these indicators and taking appropriate actions.

Understanding the views pg_stat_all_tables and pg_stat_all_indexes

The pg_stat_all_tables and pg_stat_all_indexes views are system views provided by PostgreSQL that offer valuable insights into the performance and utilization of tables and indexes.

pg_stat_all_tables: This view provides statistics about all tables in the current database, including the number of sequential and index scans, updates, deletes, inserts, and more. It also offers information on the number of live and dead tuples along with vacuum and analyze stats.
pg_stat_all_indexes: This view provides detailed statistics about all indexes in the current database. It includes information about the number of index scans, last index scan timestamp and count of rows fetched by particular index.

EDB’s PostgreSQL 16 Contributions

I have contributed to record statistics on the last sequential and index scans on tables.

1: Addition of Last index scan timestamp (last_idx_scan) column inside pg_stat_all_indexes view

2: Addition of Last Sequential scan timestamp (last_seq_scan) column inside pg_stat_all_tables view

https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-MONITORING

The main idea behind this enhancement is to ease the identification of seldom used indexes and tables. Though idx_scan and seq_scan already give the number of times a scan has been run, these values don't contain the data about when it was run. The when could be an indicator of criticality, even if the use is infrequent.

By checking timestamps in conjunction with the number of times a scan has been run, we can identify the tables or indexes which are not being used in production. As a result, we can delete them to save up storage, improve query performance and reduce backup times.

Monitoring index usage

Indexes play a crucial role in enhancing query performance by enabling faster data retrieval. Monitoring index usage can help identify underutilized as well as important indexes that are utilized frequently. The following columns are returned from pg_stat_all_indexes view:

	Column	Type	Description
1	relid	oid	OID of the table for this index
2	indexrelid	oid	OID of this index
3	schemaname	name	Name of the schema this index is in
4	relname	name	Name of the table for this index
5	indexrelname	name	Name of this index
6	idx_scan	bigint	Number of index scans initiated on this index
7	last_idx_scan	timestamp with time zone	The time of the last scan on this index, based on the most recent transaction stop time
8	idx_tup_read	bigint	Number of index entries returned by scans on this index
9	idx_tup_fetch	bigint	Number of live table rows fetched by simple index scans using this index

Important columns to check are as follows

Index scan count (idx_scan): The idx_scan column in the pg_stat_all_indexes view represents the number of times this index has been used in an index scan on the table. By monitoring this metric, we can identify which indexes are frequently used and which are rarely or never used.
Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_indexes is a new enhancement in PostgreSQL 16. It displays the timestamp of the most recent scan for a specific index. Keep in mind that the timestamp represents the transaction end time rather than the precise moment the index was utilized (which may differ significantly if it’s a long running transaction)
idx_tup_read is the number of index entries returned by scans on this index. Its value increases whenever an index entry is read.
Idx_tup_fetch is the number of live table/heap rows fetched by simple index scans using this index.

Let's demonstrate the following example:

CREATE TABLE sales (
  id integer NOT NULL, email VARCHAR,
  location VARCHAR, item VARCHAR,
  price VARCHAR
);

INSERT INTO sales (
  id, email, location, item,
  price
)
VALUES
  (
	1, 'a@hotmail.com', 'London',
	'item1', '$50'
  ),
  (
	2, 'b@hotmail.com', 'Australia',
	'item2', '$5'
  ),
  (
	3, 'c@hotmail.com', 'Canada',
	'item3', '$12'
  ),
  (
	4, 'd@hotmail.com',
	'London', 'item4', '$2'
  ),
  (
	5, 'e@hotmail.com', 'Switzerland',
	'item5', '$150'
  ),
  (
	6, 'f@hotmail.com',
	'Canada', 'item6',
	'$49'
  );

CREATE INDEX idx_sale_id ON sales(id);

Let's check the row inside pg_stat_all_indexes view:

postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid     	| 24576
indexrelid	| 24581
schemaname	| public
relname   	| sales
indexrelname  | idx_sale_id
idx_scan  	| 0
last_idx_scan | 
idx_tup_read  | 0
idx_tup_fetch | 0

Now we are going to run a select query on the sales table but before that, we are disabling seqscan GUC to force the planner to use index scan.

postgres=# set enable_seqscan=off;
SET
postgres=# explain analyze select * from sales where id = 5;
QUERY PLAN
-----------------------------------------------------------------------------------------
 Index Scan using idx_sale_id_2 on sales  (cost=0.13..8.15 rows=1 width=132) (actual time=0.057..0.059 rows=1 loops=1)
   Index Cond: (id = 5)
 Planning Time: 0.076 ms
 Execution Time: 0.077 ms
(4 rows)

Here, pg_stat_all_indexes view gets updated where we can see last index scan timestamp:

postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid     	| 24576
indexrelid	| 24581
schemaname	| public
relname   	| sales
indexrelname  | idx_sale_id
idx_scan  	| 1
last_idx_scan | 2023-07-13 07:15:49.886831-04
idx_tup_read  | 1
idx_tup_fetch | 1

To find unused/underutilized indexes, check the query below

SELECT
	idxstat.schemaname as schema_name,
	idxstat.relname AS table_name,
	indexrelname AS index_name,
	idxstat.idx_scan AS index_scans_count,
    	idxstat.last_idx_scan AS last_idx_scan_timestamp,
	pg_size_pretty(pg_relation_size(idxstat.indexrelid)) AS index_size
FROM
	pg_stat_all_indexes AS idxstat
JOIN
    pg_index i ON idxstat.indexrelid = i.indexrelid
WHERE
    idxstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
    AND NOT i.indisunique   -- is not a UNIQUE index
ORDER BY
	Idxstat.idx_scan ASC,
	Idxstat.last_idx_scan ASC;

This query displays a list of all indexes sorted by scan frequency (column 4) and last scan timestamp (column 5). If some indexes have a scan frequency (index_scans_count) of zero, less than 10, or the last_idx_scan_timestamp is very old , then we should consider deleting them.

The reasons for getting rid of unused indexes:

Database indexes frequently consume the same or even more storage than the actual data. Additionally, the kind of dependable, quick storage we need for a database can be expensive. It also increases the size and length of physical backups.
Indexes slow downs DML operations for data modification. Whenever we run an INSERT or DELETE command on a table, all indexes must be changed along with the table (the "heap") . It is quite possible that updating an indexed table will cost more than updating an unindexed table.
HOT updates are avoided by indexes. Every UPDATE operation results in the writing of a new row version (also known as a "tuple"), which creates a new item in each index on the table, instead of allowing the data to be updated in-place.

We must perform the following checks on candidates for deletion:

It's possible that the planner is avoiding the index because of duplication or incorrect setting of PostgreSQL parameters such as effective_cache_size and random_page_cost, or it thinks it will be cheaper to perform a sequential scan of the table for other reasons. This requires a judgment call, as it may (or may not) be another reason to delete it.
What size does the index occupy? Perhaps it is not currently deserving of our attention?
How many times has the corresponding table been sequentially scanned? To stop consecutive scans, we might need to add more fields to the index.
It might not be necessary to delete a table or index that appears to be inactive. For instance, it might be an audit log table that is rarely used but necessary for compliance, or it might be an index that is only used by a crucial year-end processing function.

Monitoring sequential scans

Sequential scans occur when PostgreSQL reads data from a table sequentially, without using an index. When dealing with larger tables, a sequential scan can take a long time reading the entire table, making the operation inefficient. When PostgreSQL manages numerous concurrent sequential scans, performance suffers dramatically.

Sequential scan monitoring can assist in locating performance problems and highlighting areas for index modification. The pg_stat_all_tables view returns the columns listed below.

	Column	Type	Description
1	relid	oid	OID of a table
2	schemaname	name	Name of the schema that this table is in
3	relname	name	Name of this table
4	seq_scan	bigint	Number of sequential scans initiated on this table
5	last_seq_scan	timestamp with time zone	The time of the last sequential scan on this table, based on the most recent transaction stop time
6	seq_tup_read	bigint	Number of live rows fetched by sequential scans
7	idx_scan	bigint	Number of index scans initiated on this table
8	last_idx_scan	timestamp with time zone	The time of the last index scan on this table, based on the most recent transaction stop time
9	idx_tup_fetch	bigint	Number of live rows fetched by index scans
10	n_tup_ins	bigint	Total number of rows inserted
11	n_tup_upd	bigint	Total number of rows updated. (This includes row updates counted in n_tup_hot_upd and n_tup_newpage_upd, and remaining non-HOT updates.)
12	n_tup_del	bigint	Total number of rows deleted
13	n_tup_hot_upd	bigint	Number of rows HOT updated. These are updates where no successor versions are required in indexes.
14	n_tup_newpage_upd	bigint	Number of rows updated where the successor version goes onto a new heap page, leaving behind an original version with a t_ctid field that points to a different heap page. These are always non-HOT updates.
15	n_live_tup	bigint	Estimated number of live rows
16	n_dead_tup	bigint	Estimated number of dead rows
17	n_mod_since_analyze	bigint	Estimated number of rows modified since this table was last analyzed
18	n_ins_since_vacuum	bigint	Estimated number of rows inserted since this table was last vacuumed
19	last_vacuum	timestamp with time zone	Last time at which this table was manually vacuumed (not counting VACUUM FULL)
20	last_autovacuum	timestamp with time zone	Last time at which this table was vacuumed by the autovacuum daemon
21	last_analyze	timestamp with time zone	Last time at which this table was manually analyzed
22	last_autoanalyze	timestamp with time zone	Last time at which this table was analyzed by the autovacuum daemon
23	vacuum_count	bigint	Number of times this table has been manually vacuumed (not counting VACUUM FULL)
24	autovacuum_count	bigint	Number of times this table has been vacuumed by the autovacuum daemon
25	analyze_count	bigint	Number of times this table has been manually analyzed
26	autoanalyze_count	bigint	Number of times this table has been analyzed by the autovacuum daemon

Important columns to check are as follows:

Sequential scan count (seq_scan): The seq_scan column in the pg_stat_all_tables view represents the total count of sequential scans performed on a particular table. A high number of sequential scans may indicate that appropriate indexes are missing or underutilized.
Last sequential scan timestamp (last_seq_scan): The last_seq_scan column in the pg_stat_all tables is a new addition in PostgreSQL 16. It represents the last sequential scan performed on the table. This column is updated by setting the transaction stop time value (GetCurrentTransactionStopTimestamp()).
Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_tables is the same as mentioned in pg_stat_all_indexes but it doesn’t provide information of which index was used during the latest scan.
Fragmented rows (n_dead_tup): It shows an approximate number of dead rows that have had their data removed and are slated to be reused by our datasource when we use a write command like DELETE or UPDATE.
Hot updates count (n_tup_hot_upd): We can calculate the HOT updates ratio for the most updated tables using this column data.The columns n_tup_upd and n_tup_hot_upd in this view represent the total number of regular and HOT updates for each individual table. The "good" tables are those with a high HOT rate; we should pay attention to tables with a high write count but a low or zero HOT rate. Changes to fillfactor settings are the general norm for these and allow free space to be reserved for when additional rows are added and tables are expanded. The presence of reserved space ensures that pages' rows will be updated, and it greatly increases the likelihood that a HOT update will take place. Go through the following link for more details on HOT updates: https://www.postgresql.org/docs/16/storage-hot.html

There are other useful columns such as last vacuum, autovacuum timings (last_vacuum and last_autovacuum), last_analyze and autoanalyze timings (last_analyze and last_autoanalyze) with count.

To find old tables inside PostgreSQL, we can run the following query:

SELECT
    	tabstat.schemaname AS schema_name,
tabstat.relname AS table_name,
tabstat.seq_scan AS tab_seq_scan_count,
    	tabstat.idx_Scan AS tab_index_scan_count,
    	tabstat.last_seq_scan AS tab_last_seq_scan_timestamp,
    	tabstat.last_idx_scan AS tab_last_idx_scan_timestamp,
	pg_size_pretty(pg_total_relation_size(tabstat.relid)) AS table_size
FROM
	pg_stat_all_tables AS tabstat
WHERE
tabstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
ORDER BY
	tabstat.last_seq_scan ASC,
    	tabstat.last_idx_scan ASC;

This query displays a list of all tables sorted by last sequential scan timestamp (column 5) and last index scan timestamp (column 6). The order by clause can also be replaced with seq_scan and idx_scan columns to get tables list which are rarely used.

Efficiently monitoring index usage and sequential scans

To efficiently monitor index usage and sequential scans, consider the following best practices:

a. Regularly analyze statistics

Analyze the statistics provided by the pg_stat_all_tables and pg_stat_all_indexes views regularly. Identify trends, anomalies, and areas for improvement.

b. Set up automated monitoring

Use scripts or monitoring tools to automate the gathering and analysis of statistics. This enables us to continuously scan the database and receive notifications when certain thresholds are crossed or trends are found.

c. Compare with historical data

Keep track of previous sequential scans and index usage statistics in order to compare and contrast patterns over time. As a result, trends may be found and the effects of database changes can be understood.

d. Optimize index usage

Utilize the idx_scan and last_idx_scan columns in the pg_stat_all_indexes view to locate underutilized indexes. To decrease maintenance costs and enhance query performance for inserts and updates, think about deleting or restructuring these indexes.

e. Optimize sequential scans

Identify tables with a high number of sequential scans using the seq_scan and last_seq_scan column in the pg_stat_all_tables view. Evaluate the need for appropriate indexes. We can also remove or archive those tables on which are not in use by monitoring last_seq_scan and last_idx_scan columns. This can help in reducing our disk space and as a result, backups can be completed in a quick manner.

f. Analyze Query Performance

Analyze slow-running queries using PostgreSQL EXPLAIN and EXPLAIN ANALYZE commands. Pay attention to query plans, index usage, and sequential scans mentioned in the output. By identifying queries that exhibit poor index utilization or excessive sequential scans, we can pinpoint areas that require optimization.

Conclusion

Monitoring index utilization and sequential scans is crucial for preserving your PostgreSQL database's peak performance. You may learn a lot about table and index activity by using the pg_stat_all_tables and pg_stat_all_indexes views. You may optimize query performance by identifying underutilized indexes, tracking index scans against sequential scans, and analyzing this data. You can make sure that your PostgreSQL database runs effectively and provides top performance for your applications by using these methods. To keep your database operating efficiently, bear in mind to periodically check and tweak your indexes based on the monitoring findings. Happy observing!

References

Resource Feature Callout 1

Effective PostgreSQL Monitoring: Utilizing the views pg_stat_all_tables and pg_stat_all_indexes in PostgreSQL 16

Dave Page

Introduction

Understanding the views pg_stat_all_tables and pg_stat_all_indexes

EDB’s PostgreSQL 16 Contributions

Monitoring index usage

Monitoring sequential scans

Efficiently monitoring index usage and sequential scans

a. Regularly analyze statistics

b. Set up automated monitoring

c. Compare with historical data

d. Optimize index usage

e. Optimize sequential scans

f. Analyze Query Performance

Conclusion

References

Relevant Blogs

EDB celebrates National Intern Day on Thursday, July 25th

Yet Another JSON parser for PostgreSQL

Autovacuum Tuning Basics

More Blogs

Basics of Tuning Checkpoints

EDB's Jonathan Battiato and Danish Khan Join Data on Kubernetes Community as Ambassadors

EDB Chief Database Scientist Robert Haas announces new mentorship program