Data in the Database vs. the File System
An age-old question is whether it is better to put data in a database or a file system. Of course, the answer is "it depends," but let's look at what it depends on.
First, Postgres stores its data in the file system, so the file system must be good for something. Postgres also stores some of its configuration files in the file system, e.g. pg_hba.conf, so it isn't just a issue that once you have a database, everything is better in a database.
It usually comes down to evaluating database positives vs. negatives — first the database storage positives:
- Do you need to view the data in multiple ways?
- Is synchronizing database data and file system data going to be difficult?
- Do you need multi-object commit synchronization?
- Do applications need a single, guaranteed-consistent view of their data?
- Is a file system API unreasonable or inaccessible to clients?
and these database storage negatives:
- Are the objects very large, meaning that using bytea or large objects is too cumbersome?
- Is the data so small that the per-row 28-byte overhead is too large?
- Is direct file system access useful?
It is often a question of object size. At the multi-megabyte size, having multiple copies of the data in the database while a query is executing is not a problem, and the auto-synchronization of data in the database is a great benefit. At the hundreds of megabytes and larger, the overhead of moving around multiple copies of the object in the database can become a performance problem. Of course, these problems only happen if you access the large data object in a query — if you don't query the bytea column or access the large object in the query, there is no overhead except storage. Updates of rows containing large objects also don't suffer a performance penalty if the update doesn't modify the large object. (The same is true for any long value, e.g. character strings, json.)
Also, if the object is not a binary blob to the database but has a structure the database can understand then storing it in the database has other advantages. For example, doing full text search or json lookups on data can be very beneficial.
Some of the same questions come up when deciding whether some of your data would be best stored in a different database technology and using foreign data wrappers to unify data access. Again, the answer is still "it depends," but hopefully this helps you judge the "depends" better.
This post originally appeared on Bruce's personal blog.