Effective PostgreSQL Monitoring: Utilizing the views pg_stat_all_tables and pg_stat_all_indexes in PostgreSQL 16

July 25, 2023

Introduction

Monitoring and alerting on various aspects of your PostgreSQL database is crucial for ensuring optimal performance and locating any bottlenecks. In this blog, we explore the views pg_stat_all_tables and pg_stat_all_indexes that are used to monitor index usage and table statistics. We can pinpoint areas for optimization and raise the general performance of our PostgreSQL database by effectively monitoring these indicators and taking appropriate actions.

Understanding the views pg_stat_all_tables and pg_stat_all_indexes

The pg_stat_all_tables and pg_stat_all_indexes views are system views provided by PostgreSQL that offer valuable insights into the performance and utilization of tables and indexes.

  • pg_stat_all_tables: This view provides statistics about all tables in the current database, including the number of sequential and index scans, updates, deletes, inserts, and more. It also offers information on the number of live and dead tuples along with vacuum and analyze stats.
  • pg_stat_all_indexes: This view provides detailed statistics about all indexes in the current database. It includes information about the number of index scans, last index scan timestamp and count of rows fetched by particular index.

EDB’s PostgreSQL 16 Contributions

I have contributed to record statistics on the last sequential and index scans on tables.

1: Addition of Last index scan timestamp (last_idx_scan) column inside pg_stat_all_indexes view

2: Addition of Last Sequential scan timestamp (last_seq_scan) column inside pg_stat_all_tables view

https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-MONITORING
 

The main idea behind this enhancement is to ease the identification of seldom used indexes and tables. Though idx_scan and seq_scan already give the number of times a scan has been run, these values don't contain the data about when it was run. The when could be an indicator of criticality, even if the use is infrequent. 

By checking timestamps in conjunction with the number of times a scan has been run, we can identify the tables or indexes which are not being used in production. As a result, we can delete them to save up storage, improve query performance and reduce backup times.

Monitoring index usage

Indexes play a crucial role in enhancing query performance by enabling faster data retrieval. Monitoring index usage can help identify underutilized as well as important indexes that are utilized frequently. The following columns are returned from pg_stat_all_indexes view:

 

 

Column

Type

Description

1

relid

oid

OID of the table for this index

2

indexrelid

oid

OID of this index

3

schemaname

name

Name of the schema this index is in

4

relname

name

Name of the table for this index

5

indexrelname

name

Name of this index

6

idx_scan

bigint

Number of index scans initiated on this index

7

last_idx_scan

timestamp with time zone

The time of the last scan on this index, based on the most recent transaction stop time 

8

idx_tup_read

bigint

Number of index entries returned by scans on this index

9

idx_tup_fetch

bigint

Number of live table rows fetched by simple index scans using this index

Important columns to check are as follows

  • Index scan count (idx_scan): The idx_scan column in the pg_stat_all_indexes view represents the number of times this index has been used in an index scan on the table. By monitoring this metric, we can identify which indexes are frequently used and which are rarely or never used.
  • Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_indexes is a new enhancement in PostgreSQL 16. It displays the timestamp of the most recent scan for a specific index. Keep in mind that the timestamp represents the transaction end time rather than the precise moment the index was utilized (which may differ significantly if it’s a long running transaction)
  • idx_tup_read is the number of index entries returned by scans on this index. Its value increases whenever an index entry is read.
  • Idx_tup_fetch is the number of live table/heap rows fetched by simple index scans using this index.

 Let's demonstrate the following example:

CREATE TABLE sales (
  id integer NOT NULL, email VARCHAR,
  location VARCHAR, item VARCHAR,
  price VARCHAR
);

INSERT INTO sales (
  id, email, location, item,
  price
)
VALUES
  (
	1, 'a@hotmail.com', 'London',
	'item1', '$50'
  ),
  (
	2, 'b@hotmail.com', 'Australia',
	'item2', '$5'
  ),
  (
	3, 'c@hotmail.com', 'Canada',
	'item3', '$12'
  ),
  (
	4, 'd@hotmail.com',
	'London', 'item4', '$2'
  ),
  (
	5, 'e@hotmail.com', 'Switzerland',
	'item5', '$150'
  ),
  (
	6, 'f@hotmail.com',
	'Canada', 'item6',
	'$49'
  );

CREATE INDEX idx_sale_id ON sales(id);

 

Let's check the row inside pg_stat_all_indexes view:

postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid     	| 24576
indexrelid	| 24581
schemaname	| public
relname   	| sales
indexrelname  | idx_sale_id
idx_scan  	| 0
last_idx_scan | 
idx_tup_read  | 0
idx_tup_fetch | 0

 

Now we are going to run a select query on the sales table but before that, we are disabling seqscan GUC to force the planner to use index scan.

postgres=# set enable_seqscan=off;
SET
postgres=# explain analyze select * from sales where id = 5;
QUERY PLAN
-----------------------------------------------------------------------------------------
 Index Scan using idx_sale_id_2 on sales  (cost=0.13..8.15 rows=1 width=132) (actual time=0.057..0.059 rows=1 loops=1)
   Index Cond: (id = 5)
 Planning Time: 0.076 ms
 Execution Time: 0.077 ms
(4 rows)

 

Here, pg_stat_all_indexes view gets updated where we can see last index scan timestamp:

postgres=# select * from pg_stat_all_indexes where indexrelname='idx_sale_id';
-[ RECORD 1 ]-+------------------------------
relid     	| 24576
indexrelid	| 24581
schemaname	| public
relname   	| sales
indexrelname  | idx_sale_id
idx_scan  	| 1
last_idx_scan | 2023-07-13 07:15:49.886831-04
idx_tup_read  | 1
idx_tup_fetch | 1

 

To find unused/underutilized indexes, check the query below

SELECT
	idxstat.schemaname as schema_name,
	idxstat.relname AS table_name,
	indexrelname AS index_name,
	idxstat.idx_scan AS index_scans_count,
    	idxstat.last_idx_scan AS last_idx_scan_timestamp,
	pg_size_pretty(pg_relation_size(idxstat.indexrelid)) AS index_size
FROM
	pg_stat_all_indexes AS idxstat
JOIN
    pg_index i ON idxstat.indexrelid = i.indexrelid
WHERE
    idxstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
    AND NOT i.indisunique   -- is not a UNIQUE index
ORDER BY
	Idxstat.idx_scan ASC,
	Idxstat.last_idx_scan ASC;

 

This query displays a list of all indexes sorted by scan frequency (column 4) and last scan timestamp (column 5). If some indexes have a scan frequency (index_scans_count) of zero, less than 10, or the last_idx_scan_timestamp is very old , then we should consider deleting them.

The reasons for getting rid of unused indexes:

  • Database indexes frequently consume the same or even more storage than the actual data. Additionally, the kind of dependable, quick storage we need for a database can be expensive. It also increases the size and length of physical backups.
  • Indexes slow downs DML operations for data modification. Whenever we run an INSERT or DELETE command on a table, all indexes must be changed along with the table (the "heap") . It is quite possible that updating an indexed table will cost more than updating an unindexed table.
  • HOT updates are avoided by indexes. Every UPDATE operation results in the writing of a new row version (also known as a "tuple"), which creates a new item in each index on the table, instead of allowing the data to be updated in-place.

We must perform the following checks on candidates for deletion:

  • It's possible that the planner is avoiding the index because of duplication or incorrect setting of PostgreSQL parameters such as effective_cache_size and random_page_cost, or it thinks it will be cheaper to perform a sequential scan of the table for other reasons. This requires a judgment call, as it may (or may not) be another reason to delete it.
  • What size does the index occupy? Perhaps it is not currently deserving of our attention?
  • How many times has the corresponding table been sequentially scanned? To stop consecutive scans, we might need to add more fields to the index.
  • It might not be necessary to delete a table or index that appears to be inactive. For instance, it might be an audit log table that is rarely used but necessary for compliance, or it might be an index that is only used by a crucial year-end processing function.

Monitoring sequential scans

Sequential scans occur when PostgreSQL reads data from a table sequentially, without using an index. When dealing with larger tables, a sequential scan can take a long time reading the entire table, making the operation inefficient. When PostgreSQL manages numerous concurrent sequential scans, performance suffers dramatically.

Sequential scan monitoring can assist in locating performance problems and highlighting areas for index modification. The pg_stat_all_tables view returns the columns listed below.

 

 

Column

Type

Description

1

relid

oid                      

OID of a table

2

schemaname

name                     

Name of the schema that this table is in

3

relname

name                     

Name of this table

4

seq_scan

bigint                   

Number of sequential scans initiated on this table

5

last_seq_scan

timestamp with time zone 

The time of the last sequential scan on this table, based on the most recent transaction stop time

6

seq_tup_read

bigint                   

Number of live rows fetched by sequential scans

7

idx_scan

bigint                   

Number of index scans initiated on this table

8

last_idx_scan

timestamp with time zone 

The time of the last index scan on this table, based on the most recent transaction stop time

9

idx_tup_fetch

bigint                   

Number of live rows fetched by index scans

10

n_tup_ins

bigint                   

Total number of rows inserted

11

n_tup_upd

bigint                   

Total number of rows updated. (This includes row updates counted in n_tup_hot_upd and n_tup_newpage_upd, and remaining non-HOT updates.)

12

n_tup_del

bigint                   

Total number of rows deleted

13

n_tup_hot_upd

bigint                   

Number of rows HOT updated. These are updates where no successor versions are required in indexes.

14

n_tup_newpage_upd

bigint                   

Number of rows updated where the successor version goes onto a new heap page, leaving behind an original version with a t_ctid field that points to a different heap page. These are always non-HOT updates.

15

n_live_tup

bigint                   

Estimated number of live rows

16

n_dead_tup

bigint                   

Estimated number of dead rows

17

n_mod_since_analyze

bigint                   

Estimated number of rows modified since this table was last analyzed

18

n_ins_since_vacuum

bigint                   

Estimated number of rows inserted since this table was last vacuumed

19

last_vacuum

timestamp with time zone 

Last time at which this table was manually vacuumed (not counting VACUUM FULL)

20

last_autovacuum

timestamp with time zone 

Last time at which this table was vacuumed by the autovacuum daemon

21

last_analyze

timestamp with time zone 

Last time at which this table was manually analyzed

22

last_autoanalyze

timestamp with time zone 

Last time at which this table was analyzed by the autovacuum daemon

23

vacuum_count

bigint                   

Number of times this table has been manually vacuumed (not counting VACUUM FULL)

24

autovacuum_count

bigint                   

Number of times this table has been vacuumed by the autovacuum daemon

25

analyze_count

bigint                   

Number of times this table has been manually analyzed

26

autoanalyze_count

bigint                   

Number of times this table has been analyzed by the autovacuum daemon 

Important columns to check are as follows:

  • Sequential scan count (seq_scan): The seq_scan column in the pg_stat_all_tables view represents the total count of sequential scans performed on a particular table. A high number of sequential scans may indicate that appropriate indexes are missing or underutilized.
  • Last sequential scan timestamp (last_seq_scan): The last_seq_scan column in the pg_stat_all tables is a new addition in PostgreSQL 16. It represents the last sequential scan performed on the table. This column is updated by setting the transaction stop time value (GetCurrentTransactionStopTimestamp()). 
  • Last index scan timestamp (last_idx_scan): The last_idx_scan column in the pg_stat_all_tables is the same as mentioned in pg_stat_all_indexes but it doesn’t provide information of which index was used during the latest scan.
  • Fragmented rows (n_dead_tup): It shows an approximate number of dead rows that have had their data removed and are slated to be reused by our datasource when we use a write command like DELETE or UPDATE.
  • Hot updates count (n_tup_hot_upd): We can calculate the HOT updates ratio for the most updated tables using this column data.The columns n_tup_upd and n_tup_hot_upd in this view represent the total number of regular and HOT updates for each individual table. The "good" tables are those with a high HOT rate; we should pay attention to tables with a high write count but a low or zero HOT rate. Changes to fillfactor settings are the general norm for these and allow free space to be reserved for when additional rows are added and tables are expanded. The presence of reserved space ensures that pages' rows will be updated, and it greatly increases the likelihood that a HOT update will take place. Go through the following link for more details on HOT updates: https://www.postgresql.org/docs/16/storage-hot.html

There are other useful columns such as last vacuum, autovacuum timings (last_vacuum and last_autovacuum), last_analyze and autoanalyze timings (last_analyze and last_autoanalyze) with count.

To find old tables inside PostgreSQL, we can run the following query:

SELECT
    	tabstat.schemaname AS schema_name,
tabstat.relname AS table_name,
tabstat.seq_scan AS tab_seq_scan_count,
    	tabstat.idx_Scan AS tab_index_scan_count,
    	tabstat.last_seq_scan AS tab_last_seq_scan_timestamp,
    	tabstat.last_idx_scan AS tab_last_idx_scan_timestamp,
	pg_size_pretty(pg_total_relation_size(tabstat.relid)) AS table_size
FROM
	pg_stat_all_tables AS tabstat
WHERE
tabstat.schemaname not in ('pg_catalog','information_schema','pg_toast')
ORDER BY
	tabstat.last_seq_scan ASC,
    	tabstat.last_idx_scan ASC;

 

This query displays a list of all tables sorted by last sequential scan timestamp (column 5) and last index scan timestamp (column 6). The order by clause can also be replaced with seq_scan and idx_scan columns to get tables list which are rarely used.

Efficiently monitoring index usage and sequential scans  

To efficiently monitor index usage and sequential scans, consider the following best practices:

a. Regularly analyze statistics

Analyze the statistics provided by the pg_stat_all_tables and pg_stat_all_indexes views regularly. Identify trends, anomalies, and areas for improvement.

b. Set up automated monitoring

Use scripts or monitoring tools to automate the gathering and analysis of statistics. This enables us to continuously scan the database and receive notifications when certain thresholds are crossed or trends are found.

c. Compare with historical data 

Keep track of previous sequential scans and index usage statistics in order to compare and contrast patterns over time. As a result, trends may be found and the effects of database changes can be understood.

d. Optimize index usage

Utilize the idx_scan and last_idx_scan columns in the pg_stat_all_indexes view to locate underutilized indexes. To decrease maintenance costs and enhance query performance for inserts and updates, think about deleting or restructuring these indexes.

e. Optimize sequential scans

Identify tables with a high number of sequential scans using the seq_scan and last_seq_scan column in the pg_stat_all_tables view. Evaluate the need for appropriate indexes. We can also remove or archive those tables on which are not in use by monitoring last_seq_scan and last_idx_scan columns. This can help in reducing our disk space and as a result, backups can be completed in a quick manner.

f. Analyze Query Performance

Analyze slow-running queries using PostgreSQL EXPLAIN and EXPLAIN ANALYZE commands. Pay attention to query plans, index usage, and sequential scans mentioned in the output. By identifying queries that exhibit poor index utilization or excessive sequential scans, we can pinpoint areas that require optimization.

Conclusion

Monitoring index utilization and sequential scans is crucial for preserving your PostgreSQL database's peak performance. You may learn a lot about table and index activity by using the pg_stat_all_tables and pg_stat_all_indexes views. You may optimize query performance by identifying underutilized indexes, tracking index scans against sequential scans, and analyzing this data. You can make sure that your PostgreSQL database runs effectively and provides top performance for your applications by using these methods. To keep your database operating efficiently, bear in mind to periodically check and tweak your indexes based on the monitoring findings. Happy observing!

References

Share this

Relevant Blogs

The limitations of LLMs, or why are we doing RAG?

Despite powerful capabilities with many tasks, Large Language Models (LLMs) are not know-it-alls. If you've used ChatGPT or other models, you'll have experienced how they can’t reasonably answer questions about...
June 17, 2024

PGVector as Embedding Store in PrivateGPT

EDB has a long history of open source contributions, and while we’re best known for our contributions to Postgres, that’s not the only project we contribute to. e.g Barman, CloudNativePG...
June 05, 2024

More Blogs

Finding memory leaks in Postgres C code

I spent the last week looking for a memory leak in Postgres’s WAL Sender process. I spent a few days getting more acquainted with Valgrind and gcc/clang sanitizers, but ultimately...
March 27, 2024