Postgres Pulse Insights: It’s Still Slow! - Solving the Bloated Postgres Database

Ankit Shukla April 3, 2020

 

Welcome to this week’s Postgres Pulse Insights, where we answer vital Postgres questions and provide key tactics for DBAs on how to overcome common database hurdles. Let’s jump right in and start solving the issue of the week. 

After extensive testing, you, as the skilled DBA, have tweaked the two most-significant parameters we talked about last--shared_buffers and work_mem. The DBA has saved the day, and tomorrow will be a day of rest… except, the next day begins with an all-too-familiar email:  “THE DATABASE IS SLOW. STILL!”

Instead of looking back at what you had done previously, go back to the basics again. Ask the same question: “What is slow?” The response to this may still change the approach which needs to be taken.

“What is Slow?”

  • A single query
  • Multiple queries
  • Queries to a certain relation or a group of relations

Let’s focus on queries of certain relations or a group of relations that seem to be the problem.  This is one of those scenarios which are not very obvious to the user or the DBA himself. It can easily be mistaken that all queries hitting the database are slow. In order to make the job of identifying this quickly, you should have your logging levels set up correctly; the proper balance of logging levels and the impact this can have on either performance or security requirements is a substantial topic on its own, so in this case we will focus on with other indirect methods of investigation:

Let’s start looking at the symptoms:

  1. User-initiated actions that had performed well in the past without a glitch, are now taking longer. This can be accompanied by CPU and memory usage spikes which seem to stop as soon as processing is complete.
  2. Queries which do sequential scans are now slower
  3. The queries accessing these relations seem to have a different EXPLAIN ANALYZE plan than earlier.
  4. If you check catalog tables like pg_stat_user_tables or pg_stat_all_tables, the values of last_autovacuum / last_autoanalyse and last_vacuum / last_analyse shows nothing (blank).

At this point, it’s likely that you have one of the most well-known gremlins in Postgres “THE BLOAT”.

What is bloat, exactly?  In order to fix things that aren’t working, it’s always important to have an understanding of how they behave when they do work. 

 

So what happened here?

In order to provide high concurrency for database read/write activity, Postgres makes use of a mechanism called Multi-Version Concurrency Control, or MVCC. The basic principle behind this is ‘readers never block writers and writers never block readers’. For more details about this, we recommend that you go through Bruce Momjian’s Postgres internals presentation, MVCC Unmasked

While it won’t do Bruce’s presentation true justice, here is a short synopsis:

When a block of data is being read in Postgres by one transaction, a second transaction may be in the process of updating it. In order to ensure that read transactions will not block the write, the writer applies its changes to a new row on the disk while still keeping the original row; this is done using transaction ID’s. 

So, the read transaction will have the original image which it had accessed at the beginning of its lifetime, while the second transaction finishes its update to the new image. Now, any new transaction which visits the same block will see the data which was updated by the write operation--while the old image which was seen by the prior reader will be marked obsolete.  This obsolete reference is known as a dead tuple. Simple and efficient. For deletes, the old image is made obsolete (referred to as dead tuple) and a new one is made visible for subsequent transactions.

So, in MVCC, these deletes and updates accumulate as many dead tuples.  This collection of dead tuples is what we refer to as “bloat”. Why is it bad? Well, the data on a disk is read sequentially. So, when there is a large amount of database bloat, a read operation has to go through a large number of pages on disk which do not have relevant data. This can deteriorate performance not only on sequential scans, but also in use of the index--if an index itself is bloated, then the query planner may incorrectly estimate an option and choose a less effective plan.

For many, there’s a simple answer.  Postgres has a special process known as autovacuum, which kicks in to clean up these dead tuples and mark them for reuse or return to the file system.  However, because it is automated maintenance, autovacuum is an interruptible process; this means that if any other process is accessing the page for a read or write operation, the process is paused. This pause can mean that some tables become more bloated, which can lead into the symptoms mentioned above.     

Autovacuum has its own tunable parameters, but we’re going to leave that for another day while we continue to explore how to fix the problem at hand.

Note: the description above is highly simplified for ease of understanding. Please refer to the PostgreSQL documentation to get more details. 

Now that we understand the probable cause, and its effect, we have a few options to remedy this.

 

Let's VACUUM!

For many of us, VACUUMing never sounded so exciting as it does now. While autovacuum can be paused by read/write operations, manual VACUUM doesn’t have the same deference.  A manually-initiated VACUUM has a higher priority.

As mentioned above, if you have noticed the symptoms which are listed above, then you might surmise, “Okay, I know the root cause of the problem; my database is bloated, and I need to run VACUUM immediately. But I don’t know which VACUUM.”. There are many options that can be run with VACUUM, each one with its own advantages and limitations. For a better sense of what that looks like, take a look below.

VACUUM
Plain VACUUM command with no option runs against each and every relation in the database. It helps to mark the dead (bloated) space made by DELETE and UPDATE statements to reuse it.  The physical bloated space which was occupied will not be released but just marked to reuse it so that when the next data comes in, Postgres knows where the space is available. However if empty pages at the end of tables are removed and space returned to the operating system.

For example:

VACUUM; -- Database wide VACUUM
VACUUM foo; -- VACUUM only relation foo 

VACUUM ANALYZE
This is perhaps the most common option used with VACUUM when performing maintenance tasks Postgres. When run with VACUUM, ANALYZE will collect the current statistics of the relation or all the relations present in the database and store it in a few catalog tables like pg_statistics. This helps Postgres to use updated statistics for the queries which come in, and using those statistics, the query planner chooses the best available plan. 

For example:

VACUUM analyse; -- Database wide VACUUM and ANALYSE
VACUUM analyse foo; -- VACUUM and ANALYSE only relation foo 

VACUUM FULL
When FULL option is used with VACUUM, it’s main advantages is reclaiming all bloated space to the filesystem. However, it doesn’t come without a price--its biggest drawback is that it takes an exclusive lock on the relation to removing the bloat. This means that no one can access that relation until the execution of VACUUM FULL completes. 

You must also keep in mind that when you run VACUUM FULL on a relation, you must have exactly double the space of the relation available on the disk--as it creates a copy of the relation and completely rewrites it. This should only be done when you have significant availability of performance--some examples might be taking a scheduled downtime, or a nightly lean period of activity. This should be used as sparingly as possible. 

For example:

VACUUM full; -- Database wide VACUUM with FULL option
VACUUM foo; -- VACUUM with FULL option for only relation foo 

VACUUM VERBOSE [options]
VERBOSE is the option which can be used with any of the options mentioned above; it provides you extra information on what work VACUUM did on the relation. How many tuples scanned, how many are removable, how many are non-removable since they are being accessed by the older transaction, etc. It is very useful to print the report of the VACUUM activity on a relation in detail.

For example:

VACUUM verbose analyse foo; -- VACUUM and ANALYSE with a detailed report for relation foo

Note: VACUUM on a table automatically VACUUMs the index on the table so you don’t need to be bothered to deal with them separately.

 

Save the Day!

You, the trusty DBA, were able to identify several examples of bloat.  You implemented autovacuum, and even did a VACUUM FULL overnight.  The next morning, a new email sat atop your inbox, amidst the many others:  “THANK YOU!”

We hope you have found the information in the blog useful and add to your understanding of PostgreSQL. There are a lot of other parameters and scenarios which we have left out for the sake of keeping the content simple. Please go through the VACUUMing Basics section of PostgeSQL document for more details. 

 

Join us in our next Live Session!

Did you miss our first Postgres Pulse Live session? You can check it out here

We’ll be hosting our next Postgres Pulse Live on Monday, April 6th, at 11am ET. Click here to set a calendar reminder!  

While you’re here, please feel free to send us your Postgres questions at postgrespulse@enterprisedb.com. We’ll make sure to include them in our upcoming Live sessions and Insights!

 

  

Ankit Shukla