The EDB Blog
April 10, 2013

Sql has never been good at handling dynamic columns — it can easily process tables with a fixed number of columns, but dynamic column handling has always been a challenge. (Dynamic columns are often used to implement user-defined fields.) One approach has been to define columns for every possible value that might be stored in a row; this leads to sparse tables containing mostly null values. Because Postgres stores null values in a single bit, there is little overhead to this approach. However, it is hard to call these dynamic columns since every new column requires an alter table ... add column command.

Another approach is to use entity/attribute/value layout, where each value is stored in its own row. This obviously increases storage requirements, has performance challenges, and adds complexity to sql queries.

Fortunately, Postgres has various options to efficiently handle dynamic columns. The simplest solution is arrays which allow multiple values of the same data type to be stored in a single field. This assumes that the values are independent, or that their ordinal position in the array is sufficiently meaningful.

A more complex approach is to associate a key with each value. (Values internally are strings, rather than bound to a specific data type.) Hstore is a very popular key/value store for Postgres, and has index lookup support for arbitrary key/value pairs. (It does not allow hierarchical storage, i.e. key/values inside a key).

Xml and more recently json allow hierarchical storage and more naturally match client application languages. Unfortunately, none of the Postgres native index types understand the structure of json or xml data. (Json and xml documents are stored internally as single text strings.) To index json, it is necessary to create an expression index on a plv8 function (which understands json). Andrew Dunstan has a good blog entry using a plv8 function (jmember) to index specific json keys. Postgres 9.3 will add functions (blog, blog, blog, blog) to access json values directly — this will eliminate the need to use plv8 in simple cases.

For arbitrary json key lookups, it is possible to use a Postgres full text search index to filter out most non-matching rows and use a json-aware function to compare the remaining rows — this avoids the requirement of creating an index per lookup key.

Xml has similar capabilities to json — it can store hierarchical data, and can use full text search to filter matching rows. While many server-side languages understand xml, e.g. PL/Java, PL/Perl, most people use the built-in Postgres xml functions in expression indexes.

As you can see, Postgres offers a variety of options to store dynamic columns while maintaining a fixed-column table format that is easily accessed via sql — this gives you the best of both worlds when storing dynamic data.

bruce.momjian's picture

Bruce Momjian is a co-founder of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996 as a committer and community leader. He is a frequent speaker and Postgres evangelist and travels worldwide appearing at conferences to help educate the community on the business...