Index Corruption in PostgreSQL: The Hidden Cost of Your Queries
A corruption in the database base often manifests itself in the form of errors when performing a DDL or DML operation on affected relations. The errors often show up as errors reading tables due to a missing file—that is rather straightforward, of course. However, a corrupt index is not as obvious as a corrupt table. More often than not to find if indeed there is index corruption, one needs to observe the behavior (execution times) and results of the queries themselves.
What are some of the symptoms of a corrupt index you might see in a database?
A. Queries that use indexes are performing more sequential scans.
B. Some more vigilant DBAs who keep an eye on pg_stat_user_tables see the idx_scan column increasing at a much slower rate (or not changing at all) when compared to seq_scan
C. Users complain that some of the data which they are receiving is not what they expect.
In situations A and B, for the majority of the cases, you could easily be misled to the conclusion that your statistics were out of date. In Situation C, this behavior guides us more toward a corrupt index, but it’s hard to know for sure without more knowledge; this is one scenario where the “Google and go” approach often yields very little for anyone hunting for an answer to their challenge. So, we decided to convert it into a blog; for as rare as it is for your average PostgreSQL user, our goal is to help you make this elusive problem easier to pin down.
Note: Due to privacy and compliance limitations, through the blog, we are of course not using actual user data but of a sample test case to demonstrate similar results as seen.
Unexpected Query Results
Our user had a query which was a part of his regular workflow. It was a part of a job that purged older data based upon the application need. In the last few executions of the job, the user saw a major reduction in the number of rows being cleared by the job— but none of the usual suspects, like retention rules, had been changed to explain the problem. In addition, it looked like there was also a steady increase in the size of the tables on which the job was running.
Eliminating the Obvious
As no errors were logged during job execution, we started looking at DB logs for any errors/warnings. The job was working fine without any issues on one particular day, but one day it suddenly stopped working. The database is still up and running, connectivity is intact, applications are running fine. What went wrong?
We started with eliminating the obvious:
- Has there been any change in business logic/table structures?
- Has there been any change in retention rules?
- Have there been any bulk operations, like data load or truncate?
The next step was to examine the contents of the logs generated at the time of last execution of the purge job and we came across the following:
"ERROR: could not read block xxx in file"
What does the error mean?
Let’s take a quick look at what a block is and why not being able to read it caused the query to fail. In PostgreSQL, all tables and indexes are stored as a collection of pages; these pages are 8KB by default, though it can be customized during server compile. Since pages don’t vary in size once it’s defined during the compilation, these pages are all logically equivalent when we’re speaking of table pages. However, with an index, the first page is normally reserved as a metapage; the difference is that it carries control information within it. As a result, there may be different types of pages in a single index. In this case, it's this 8KB page which is referred to as a block in the error.
Hence we can conclude that the query was looking for information in a block [page] but for some reason was not able to read it. The good news? With the error pointing to a specific block, it’s a reasonable possibility that the block itself may be the only corruption in the table, rather than the entire table.
Note: If you wish to get more information on the structure of a page in PostgreSQL please refer to Database Page Layout section of the PostgreSQL documentation.
Narrowing it Down
Looking at the Postgres database logs, we found something similar to this:
2020-05-03 21:55:46 CEST : user=postgres,db=postgres,app=psql.bin,client=[local]STATEMENT: explain analyze select * from pgbench_accounts where aid >100000000 ; 2020-05-03 21:56:59 CEST : user=postgres,db=postgres,app=psql.bin,client=[local]ERROR: could not read block 274179 in file "base/13642/24643.2": read only 0 of 8192 bytes
We can find out what this file which contains the block actually belongs to by using a query like:
postgres=# select n.nspname AS schema, c.relname AS realtion from pg_class c inner join pg_namespace n on (c.relnamespace = n.oid) where c.relfilenode = '24643'; schema | relation --------+---------------------- public | idx_pgbench_accounts (1 row) postgres=#
Note: In the above query we have ignored the ‘.2’ part of file name as only the numbers prior to the ‘.’ corresponds to the first segment's file name is the same as the filenode. For more details please refer to Database File Layout in the PostgreSQL documentation.
Now we know that the block which could not be read was part of an index which was called in the query. But we still have no clue whether or not the table itself is okay.
A reliable method to check table sanity is to do a pg_dump of the table to which the index belongs, as pg_dump doesn’t use any indexes but reads the table data directly. Below is a sample of the output:
-bash-4.2$ /opt/PostgresPlus/9.5AS/bin/pg_dump -p 5446 -v -t pgbench_accounts postgres > backup_pgbench_accounts.sql >> backup_pgbench_accounts.log 2>&1 -bash-4.2$ tail backup_pgbench_accounts.log CREATE INDEX idx_pgbench_accounts ON pgbench_accounts USING btree (aid); -- Completed on 2020-05-03 22:53:49 EDT -- -- EnterpriseDB database dump complete -- -bash-4.2$
So, our data in the table is fine. All we need to do is rebuild the index to fix the problem. We are going to show how to rebuild the index in the later part of our article.
Let’s add a hypothetical but a possible twist where there were no errors in the logs. What could we do next? We have no errors, the query runs fine, but the user insists that the data returned is not correct. We could generate the query’s ‘explain analyze’ plan, which should show where it is getting its data from.
postgres=# explain analyze select * from pgbench_accounts where aid>1 and aid <10000 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on pgbench_accounts (cost=174.61..21830.00 rows=11126 width=97) (actual time=1.775..4.469 rows=9998 loops=1) Recheck Cond: ((aid > 1) AND (aid < 10000)) Heap Blocks: exact=164 -> Bitmap Index Scan on idx_pgbench_accounts (cost=0.00..171.83 rows=11126 width=0) (actual time=1.720..1.720 rows=9998 loops=1) Index Cond: ((aid > 1) AND (aid < 10000)) Planning time: 0.242 ms Execution time: 5.286 ms (7 rows)
So, the query is using the index.
Next, step would be to force the query to hit the table directly and see if the results are any different. This can be done by setting parameter ‘enable_indexscan’ parameter to off in the psql session and executing the query again:
postgres=# set enable_indexscan='off'; SET postgres=# explain analyze select * from pgbench_accounts where aid >10000000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on pgbench_accounts (cost=0.00..4889345.00 rows=89592207 width=97) (actual time=1859.666..15070.333 rows=90000000 loops=1) Filter: (aid > 10000000) Rows Removed by Filter: 10000000 Planning time: 0.161 ms Execution time: 18394.457 ms (12 rows) postgres=#
If there is a difference in the output in this case row count (7 with Index Scan and 12 with Sequential scan), you have every reason to believe something is wrong with your index.
Fixing the Problem
We have enough data to conclude now that there is an issue with the index and it's been corrupted. To fix this we have to rebuild the index and we have a couple of options on how to go about this, each with its own pros and cons:
A. Use ‘REINDEX name’: Using this will allow reads, but locks all writes of the index’s parent table. It also will place an exclusive lock on the index that’s benign processed; this means that even read attempts on that index will fail during this operation. The best time to use this would be a scheduled downtime or lean period of activity.
B. Use ‘REINDEX CONCURRENTLY’: This is a better option but is only available in Postgres version 12 onwards.
C. Use ‘CREATE INDEX CONCURRENTLY’ and drop old/corrupt index: If you are on a version older than 12 your best bet will be to create a new index using ‘CREATE INDEX CONCURRENTLY’ which will not block you existing operations on the table and then drop the corrupt index using the ‘DROP INDEX’ drop the old one. The best time to use this would be a lean period of activity as ‘CREATE INDEX CONCURRENTLY’ is slowed down with a large number of updates/inserts/deletes on the table when it's working.
The overall chances of any DBA never running into corruption is rare. So, if they’re likely to happen at some point, and your recovery depends on the ability to detect the problem early, what can you do? The answer isn’t just one thing, but many: Monitoring, checksums, and pg_catcheck are a few ways you can catch the problem as it occurs. Bringing your environment back to good health, however, depends on the things we’ve talked about in prior posts: PITR backups and the WAL stream, standby servers, and delayed standby servers can all help in various ways to rescue you from the problem when you do hit it. Throughout all strategies, caution is the best characteristic you can use to ensure the survival of your data… and your peace of mind!
Join Postgres Pulse Live!
We make use of the problems we solve and the conversations we have in helping people with Postgres; while this scenario was one we see less often, it’s one for which we find the solution to be particularly useful to those who do encounter the invisible cost of index corruption. Here you can find all things Postgres Pulse, including all of our blog posts and our YouTube series.
Join us on Monday, May 11th, for our next Pulse Live Session! We’ll dive into this week’s questions and quagmires around index corruption and the like, as well as take questions from anyone who participates. You can ask your questions via email at firstname.lastname@example.org, hashtag on Twitter, or live during the event right here. What are you waiting for?
This blog post was co-authored by Ajay Patel and Deepanshu Sharma.
Get Postgres Tips and Tricks
Subscribe to our newsletter to get advanced Postgres how-tos.