VACUUM FULL doesn't mean "VACUUM, but better"

March 04, 2014

There's a persistent belief among some users of PostgreSQL that VACUUM and VACUUM FULL do the same thing, but that VACUUM FULL does it better.  If VACUUM is the moral equivalent of running the Dust Buster across the room a few times, VACUUM FULL must be the equivalent of hiring a professional cleaning crew to shampoo the carpets, and maybe repaint the walls as well.  Unfortunately, this mental model is not accurate.

Here's a better one.  VACUUM is the equivalent of going into each room of your house, finding all of the stuff in there that you don't need any more, and throwing it out.  This is important because, if you don't do it, your house will eventually fill up with stuff, much of which will be garbage.  As you continue to acquire new things without getting rid of any of the old ones, you're eventually going to need a larger house, or an annex.  VACUUM FULL, on the other hand, is the equivalent of taking all of the stuff from your house that you still want, loading it into a moving van, buying the smallest house that suffices to contain the contents of the moving van, and then burning down the old house.  If there's very little that you actually want to keep, and a vast amount of junk that you want to get rid of, the VACUUM FULL procedure may actually be more efficient.  But in most cases, VACUUM is the way to go: it's usually less work, and the house can still be used while it's being cleaned.  If you find yourself using the VACUUM FULL procedure with any regularity, something is wrong.

In slightly more technical terms, whenever a transaction performs an update, we add a new copy of the row.  If that transaction goes on to commit, the old row should be removed; if it aborts, the new row should be removed.  Similarly, if a transaction inserts a row but goes on to abort, the row must be removed.  If it deletes a row and goes on to commit, the row is only marked for removal, and must afterwards be removed for real.  VACUUM does all of these things.  It also shrinks the relation on disk if there happens to be free space at the end of the file, but in general if it removes a large number of dead tuples, there may be significant internal free space left behind in the file.  That's not all a bad thing, because future updates and inserts are going to need a place to put their data, and using already-allocated space is more efficient than getting new space.  But if there's an excessive amount of preallocated space, and you need to return some of it to the operating system to improve sequential scan speed, or make room for other relations to grow, then you need VACUUM FULL, which will scan the relation for tuples that are NOT dead, write them into a new file, and remove the old file.  If you're doing this regularly, then you should try to make your system's autovacuum settings more aggressive so that the problem doesn't arise; that is, you need to throw away garbage more quickly so that the house doesn't fill up and require expansion.

There's one situation in which it's particularly critical to use VACUUM rather than VACUUM FULL.  In addition to its routine garbage collection duties, VACUUM is also responsible for preventing a phenomenon called transaction ID wraparound, which is discussed in the PostgreSQL documentation under the heading Preventing Transaction ID Wraparound Failures.  Beginning in 9.3, it must also prevent the related phenomenon of multixact ID wraparound.  Normally, this is all pretty transparent, but sometimes (particularly with a poor selection of vacuum-related settings) your system will begin to emit warnings or even shut down because data loss is imminent and vacuum must be completed without further delay.  In current releases, this problem is only addressed by VACUUM without the FULL option.  VACUUM FULL will notforestall transaction ID wraparound and should not be used in circumstances where transaction ID wraparound is a risk.

PostgreSQL 9.4 will make some improvements in this area.  Beginning in 9.4, VACUUM FULL (and CLUSTER, which uses much of the same code) will forestall transaction ID wraparound.  This is a significant improvement, because right now, you might use VACUUM FULL or CLUSTER to rewrite a relation for other reasons (e.g. to remove dead tuples, or in the case of CLUSTER to improve query performance) and then discover that you've got to run a regular VACUUM on the table shortly thereafter for wraparound purposes, rewriting all the data again.  These changes will eliminate that possibility.  However, the recommended recovery strategy for severe cases of transaction ID wraparound will remain VACUUM rather than VACUUM FULL.  This is not only for reasons of efficiency, but also because VACUUM FULL consumes a transaction ID, while plain VACUUM does not.  When only a few transaction IDs remain before a wraparound event, it is best to economize, especially if the database in question contains many tables; when VACUUM FULL is applied to multiple tables, one transaction ID will be used for each.

(Historical Note: Prior to PostgreSQL 9.0, VACUUM FULL worked very differently; the information in this blog post does not apply to older versions.) 

Share this