In Postgres-XL, sequences are maintained at the Global Transaction Manager (GTM) to ensure that they are assigned non-conflicting values when they are incremented from multiple nodes. This adds significant overhead for a query doing thousands of INSERTs in a table with a serial column, incrementing sequence one at a time and making a network roundtrip to the GTM, for every INSERT.
Shaun Thomas in a recent blog complained about INSERTs running a magnitude slower on Postgres-XL as compared to vanilla PostgreSQL. There is already a way to improve performance for sequences, but it’s clearly not well advertised. I thought this is a good opportunity to explain the facility.
Postgres-XL provides a user-settable GUC called sequence_range. Every backend requests a block of sequence values as controlled by this GUC. Given that COPY is popularly used to bulk load data in Postgres, Postgres-XL automatically overrides this GUC during COPY operation and sets it to 1000, thus dramatically improving COPY performance. Unfortunately, for regular INSERTs, the default is 1 and unless user explicitly sets sequence_range to a reasonably higher value, INSERT performance suffers. Here is an example, using the same sample schema as used by Shaun in his blog post.
CREATE TABLE sensor_log (
sensor_log_id SERIAL PRIMARY KEY,
location VARCHAR NOT NULL,
reading BIGINT NOT NULL,
reading_date TIMESTAMP NOT NULL
) DISTRIBUTE BY HASH (sensor_log_id);
postgres=# \timing
Timing is on.
postgres=# INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 12067.911 ms
postgres=# set sequence_range TO 1000;
SET
Time: 1.231 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 397.406 ms
So by appropriately setting sequence_range to 1000, performance of the INSERT query improved by nearly 30 times.
When this feature was added, the default value of sequence_range GUC was set to 1 because it can leave holes in the sequence values. But looking at the performance implications for a very common use case, we decided to increase the default to 1000 and this has now been committed to the XL9_5_STABLE branch of the repository.
Its important to note that while a high value of sequence_range will improve performance for sequences and serials, it can also leave large holes in sequence ranges since the sequence ranges are cached at a backend level. To address this issue, Postgres-XL starts with the specified CACHE parameter value used at sequence creation time and doubles it every time (limited by sequence_range) if sequences are being consumed at a very high rate.
Similar improvement can also be achieved by increasing the CACHE parameter value of the sequence so that a chunk of sequence values are cached at the backend level. Following example shows how to do that for a serial column. But the sequence_range GUC provides an easy way to override the global default and also ensures that the sequences are cached only when they are getting incremented very rapidly.
postgres=# ALTER SEQUENCE sensor_log_sensor_log_id_seq CACHE 1000; ALTER SEQUENCE
Time: 8.683 ms
postgres=# SET sequence_range TO 1;
SET
Time: 2.341 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 418.068 ms
You may choose any of these techniques to improve the performance. Though now that the default value of sequence_range is changed to 1000, not many users may see the difference in performance.