Shaun Thomas’s recent post about client-side loops as an SQL anti-pattern is well worth a read if you’re relatively new to SQL-based application development.
It’s reminded me of another SQL coding anti-pattern that I see quite a lot: the naïve read-modify-write cycle. Here I’ll explain what this common development mistake is, how to identify it, and options for how to fix it.
Imagine your code wants to look up a user’s balance, subtract 100 from it if doing so won’t make it negative, and save it.
It’s common to see this written as three steps:
SELECT balance FROM accounts WHERE user_id = 1;
-- in the application, subtract 100 from balance if it's above
-- 100; and, where ? is the new balance:
UPDATE accounts SET balance = ? WHERE user_id =1;
and everything will appear to work fine to the developer. However, this code is critically wrong, and will malfunction as soon as the same user is updated by two different sessions at the same time.
Imagine two concurrent sessions, each subtracting 100 from the user’s balance, starting with an initial value of 300.
Session 1 | Session 2 |
---|---|
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) |
Whoops!. The balance is 200, but you took out 200 from a starting point of 300. So 100 has just vanished.
Most testing and development is done on standalone servers running single sessions, so unless you’re doing rigorous testing this sort of thing often doesn’t get noticed until production, and can be painful to debug. It’s important to know about it so you can code defensively.
Don’t transactions prevent this?
Following along
If you want to try this or any of the other examples in this article, just run the following setup code:
CREATE TABLE accounts (user_id integer primary key, balance integer not null);
INSERT INTO accounts(user_id, balance) VALUES (1, 300);
then run the commands in each session column in separate psql sessions by using two terminal windows**.
I often have people on Stack Overflow ask things to the tune of “Don’t transactions prevent this?”. Unfortunately, while great, transactions aren’t magic secret sauce you can add for easy concurrency. The only way to let you completely ignore concurrency issues is to LOCK TABLE every table you might use before starting the transaction (and even then you have to always lock in the same order to prevent deadlocks).
In a transaction the outcome in this case is exactly the same:
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
COMMIT; | COMMIT; |
Solutions
Thankfully PostgreSQL (and SQL in general) has a few tools that will help you, and there are some application side options too. Some popular solutions to this problem are:
- Avoiding the read-modify-write with a calculated update
- Row level locking with SELECT ... FOR UPDATE
- Use of SERIALIZABLE transactions
- Optimistic concurrency control, otherwise known as optimistic locking
Avoiding the read-modify-write cycle
The best solution is often to just do the work in SQL, avoiding the read-modify-write-cycle entirely.
Just write:
Session 1 | Session 2 |
---|---|
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; (sets balance=200) | |
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; (sets balance=100) |
This works even if the two statements are in concurrent transactions, because the first takes a lock on the row and the second waits on the lock until the first commits or rolls back. The transaction isolation documentation covers this in more detail under READ COMMITTED.
This option is only viable for simpler cases, though. It isn’t useful if the application needs to do some complex logic based on the current balance to determine if the update should proceed, for example.
This method is usually the simplest and fastest choice where it’s applicable.
Note that in the non-default SERIALIZABLE isolation it prevents the error, but does so differently. See the discussion of SERIALIZABLE below, and the documentation.
Row level locking
The simplest solution to fix an existing broken application with the fewest changes is generally to add row level locking.
Instead of SELECT balance FROM accounts WHERE user_id = 1 write SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE. This takes a row-level write lock. Any other transaction that tries to UPDATE the row or SELECT ... FOR UPDATE (or FOR SHARE) it will pause until the transaction that holds the lock rolls back or commits.
In the example above and in PostgreSQL’s default transaction isolation level the second SELECT would simply not return until the first transaction ran its UPDATE then COMMITed. Then the first transaction would continue, but the SELECT would return 200 instead of 300, so the correct value would be produced.
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE; (gets stuck and waits for transaction 1) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
COMMIT; | |
(second transaction’s SELECT returns 200) | |
UPDATE balance SET balance = 100 WHERE user_id = 1; (200 – 100 = 100) | |
COMMIT |
The explicit locking chapter in the documentation discusses this in more detail.
Note that this only works if the read-modify-write cycle is contained within a single transaction, as locks only exist for the lifetime of a transaction.
SERIALIZABLE transactions
If the read-modify-write cycle is always contained within a single transaction and if you’re running PostgreSQL 9.1 or newer, you can use a SERIALIZABLE transaction instead of explicit SELECT ... FOR UPDATE.
In this case both SELECTs would proceed normally, as would both UPDATEs. The first transaction would COMMIT fine. However, when you went to COMMIT the second transaction it would instead abort with a serialization error. You would then re-run the failed transaction from the beginning*:
Session 1 | Session 2 |
---|---|
BEGIN ISOLATION LEVEL SERIALIZABLE; | BEGIN ISOLATION LEVEL SERIALIZABLE; |
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE accounts SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE accounts SET balance = 200 WHERE user_id = 1; (gets stuck on session1’s lock and doesn’t proceed) | |
COMMIT – succeeds, setting balance=200 | |
(UPDATE continues, but sees that the row has been changed and aborts with a could not serialize access due to concurrent update error) | |
COMMIT converted into forced ROLLBACK, leaving balance unchanged |
SERIALIZABLE isolation can force applications to repeat a lot of work if a big transaction aborts or if conflicts are common. It’s very useful for complex cases where attempting to use row locking might just cause deadlocks, though.
The documentation on concurrency control and transaction isolation is the best resource for information on serializable isolation. If you’re not used to thinking about concurrency issues be prepared to read it a few times and be sure to try out the examples.
Optimistic concurrency control
Optimistic concurrency control (or “optimistic locking”) is usually implemented as an application-side method for handling concurrency, often by object relational mapping tools like Hibernate.
In this scheme, all tables have a version column or last-updated timestamp, and all updates have an extra WHERE clause entry that checks to make sure the version column hasn’t changed since the row was read. The application checks to see if any rows were affected by the UPDATE and if none were affected, treats it as an error and aborts the transaction.
For this demo you need to add a new column:
ALTER TABLE accounts ADD COLUMN version integer NOT NULL DEFAULT 1;
Then the example above becomes:
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance, version FROM accounts WHERE user_id = 1; (returns 1, 300) | |
SELECT version, balance FROM accounts WHERE user_id = 1; (also returns 1, 300) | |
COMMIT; | COMMIT; |
BEGIN; | BEGIN; |
UPDATE accounts SET balance = 200, version = 2 WHERE user_id = 1 AND version = 1; (300 – 100 = 200. Succeeds, reporting 1 row changed.) | |
UPDATE accounts SET balance = 200, version = 2 WHERE user_id = 1 AND version = 1; (300 – 100 = 200). Blocks on session 1’s lock. | |
COMMIT; | |
(UPDATE returns, matching zero rows because it sees version=2 in the WHERE clause) | |
ROLLBACK; because of error detected |
Because it’s fiddly to code, optimistic concurrency control is usually used via an ORM or query building tool.
Unlike SERIALIZABLE isolation, it works even in autocommit mode or if the statements are in separate transactions. For this reason it’s often a good choice for web applications that might have very long user “think time” pauses or where clients might just vanish mid-session, as it doesn’t need long-running transactions that can cause performance problems.
Optimistic concurrency control can co-exist with traditional locking based approaches if you use triggers to enforce the optimistic concurrency locking rules; like this. Nonetheless it’s generally used instead of other explicit concurrency control methods.
Which to use?
The appropriate choice depends on what you’re doing, your scalability requirements, how easy it is to write retry loops to re-run failed transactions, etc.
There is no right answer for everybody. If there was, there’d only be one way to do it, not several.
There is, however, a definitively wrong way, and that’s to ignore concurrency issues and expect the database to just take care of them for you.
Future topics
There’s lots more to watch out for.
- “Testing” connections to avoid adding error handling and retry code
- Home brew queueing implementations
- Sending email / calling web services etc from the DB
- The many ways to get upsert (insert-if-exists-update) or insert-if-not-exists wrong
- … and plenty more
I’ll discuss some of them in later weeks.
* The topic of transaction re-tries brings me to an anti-pattern I want to discuss later: testing connections and writing cannot-afford-to-fail transactions.
** Or if you’re really keen you can turn them into automated tests using PostgreSQL’s isolationtester, in src/test/isolation