You might have seen the support added for security_barrier views in PostgreSQL 9.2. I’ve been looking into that code with an eye to adding automatic update support for them as part of progressing row-level security work for the AXLE project, and I thought I’d take the chance to explain how they work.
Robert already explained why they’re useful and what they protect against. (It turns out it’s also discussed in what’s new in 9.2). Now I want to go into how they work and discuss how security_barrier views interact with automatically updatable views.
Normal views
A normal simple view is expanded in a macro-like fashion as a subquery which is then usually optimized away by pulling its predicate up and appending it to the quals of the containing query. That might make more sense with an example.Given table:
CREATE TABLE t AS SELECT n, 'secret'||n AS secret FROM generate_series(1,20) n;
and view:
CREATE VIEW t_odd AS SELECT n, secret FROM t WHERE n % 2 = 1;
a query like:
SELECT * FROM t_odd WHERE n < 4
is view-expanded inside the query rewriter into a parse tree representation of a query like:
SELECT * FROM (SELECT * FROM t WHERE n % 2 = 1) t_odd WHERE n < 4
which the optimizer then flattens into a single pass query by eliminating the subquery and appending the WHERE clause terms to the outer query, producing:
SELECT * FROM t t_odd WHERE (n % 2 = 1) AND (n < 4)
Even though you can’t see the intermediate queries directly and they never exist as real SQL, you can observe this process by enabling debug_print_parse = on, debug_print_rewritten = on and debug_print_plan = on in postgresql.conf. I won’t reproduce the parse and plan trees here as they’re quite large and easy to generate based on the examples above.
The problem with using views for security
You might think that granting somebody access to the view without granting them access to the underlying table would stop them seeing even-numbered rows. Initially it looks like that’s true:
regress=> SELECT * FROM t_odd WHERE n < 4;
n | secret
---+---------
1 | secret1
3 | secret3
(2 rows)
but when you look at the plan you might see a potential problem:
regress=> EXPLAIN SELECT * FROM t_odd WHERE n < 4;
QUERY PLAN
---------------------------------------------------
Seq Scan on t (cost=0.00..31.53 rows=2 width=36)
Filter: ((n < 4) AND ((n % 2) = 1))
(2 rows)
The view subquery has been optimized away, with the view’s qualifiers appended directly to the outer query.
In SQL, AND and OR aren’t ordered. The optimizer/executor are free to run whichever branch they think is more likely to give them a quick answer and possibly let them avoid running the other branches. So if the planner thinks that n < 4 is much faster than n % 2 = 1 it’ll evaluate that first. Seems harmless, right? Try:
regress=> CREATE OR REPLACE FUNCTION f_leak(text) RETURNS boolean AS $$
BEGIN
RAISE NOTICE 'Secret is: %',$1;
RETURN true;
END;
$$ COST 1 LANGUAGE plpgsql;
regress=> SELECT * FROM t_odd WHERE f_leak(secret) AND n < 4;
NOTICE: Secret is: secret1
NOTICE: Secret is: secret2
NOTICE: Secret is: secret3
NOTICE: Secret is: secret4
NOTICE: Secret is: secret5
NOTICE: Secret is: secret6
NOTICE: Secret is: secret7
NOTICE: Secret is: secret8
NOTICE: Secret is: secret9
NOTICE: Secret is: secret10
NOTICE: Secret is: secret11
NOTICE: Secret is: secret12
NOTICE: Secret is: secret13
NOTICE: Secret is: secret14
NOTICE: Secret is: secret15
NOTICE: Secret is: secret16
NOTICE: Secret is: secret17
NOTICE: Secret is: secret18
NOTICE: Secret is: secret19
NOTICE: Secret is: secret20
n | secret
---+---------
1 | secret1
3 | secret3
(2 rows)
regress=> EXPLAIN SELECT * FROM t_odd WHERE f_leak(secret) AND n < 4;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t (cost=0.00..34.60 rows=1 width=36)
Filter: (f_leak(secret) AND (n < 4) AND ((n % 2) = 1))
(2 rows)
Whoops! As you can see, the user-supplied predicate function was considered to be cheaper to run than the other tests, so it was passed every row before the view’s predicate had excluded it. A malicious function could use the same trick to copy the row.
security_barrier views
security_barrier views fix that by forcing the qualifiers on the view to be executed first, before any user-supplied qualifiers run. Instead of expanding the view and appending any view qualifiers to the outer query, they replace the reference to the view with a subquery. This subquery has the security_barrier flag set on its range-table entry, which tells the optimizer it shouldn’t flatten the subquery or push outer query conditions down into it like it would for a normal subquery.
So with a security barrier view:
CREATE VIEW t_odd_sb WITH (security_barrier) AS SELECT n, secret FROM t WHERE n % 2 = 1;
we get:
regress=> SELECT * FROM t_odd_sb WHERE f_leak(secret) AND n < 4;
NOTICE: Secret is: secret1
NOTICE: Secret is: secret3
n | secret
---+---------
1 | secret1
3 | secret3
(2 rows)
regress=> EXPLAIN SELECT * FROM t_odd_sb WHERE f_leak(secret) AND n < 4;
QUERY PLAN
---------------------------------------------------------------
Subquery Scan on t_odd_sb (cost=0.00..31.55 rows=1 width=36)
Filter: f_leak(t_odd_sb.secret)
-> Seq Scan on t (cost=0.00..31.53 rows=2 width=36)
Filter: ((n < 4) AND ((n % 2) = 1))
(4 rows)
The query plan should tell you what’s happening, though it doesn’t show the security barrier attribute in the explain output. The nested subquery forces a scan on t with the view qualifier, then the user-supplied function runs on the result of the subquery.
But. Wait a sec. Why is the user-supplied predicate n < 4 also inside the subquery? Isn’t that a potential security hole? If n < 4 is pushed down, why isn’t f_leak(secret)?
LEAKPROOF operators and functions
The explanation for that is that the < operator is marked LEAKPROOF. This attribute indicates that an operator or function is trusted not to leak information, so it can be safely pushed down through security_barrier views. For obvious reasons you can’t set LEAKPROOF as an ordinary user:
regress=> ALTER FUNCTION f_leak(text) LEAKPROOF;
ERROR: only superuser can define a leakproof function
and the superuser can already do whatever they want, so they don’t need to resort to playing tricks with functions leaking information to get past a security barrier view.
Why can’t you update security_barrier views
Simple views in PostgreSQL 9.3 are automatically updatable, but security_barrier views aren’t considered “simple”. That’s because updating views relies on being able to flatten the view subquery away, turning the update into a simple update on a table. The whole point of security_barrier views is to prevent that flattening. UPDATE cannot currently operate on a subquery directly, so PostgreSQL will reject any attempt to update a security_barrier view:
regress=> UPDATE t_odd SET secret = 'secret_haha'||n;
UPDATE 10
regress=> UPDATE t_odd_sb SET secret = 'secret_haha'||n;
ERROR: cannot update view "t_odd_sb"
DETAIL: Security-barrier views are not automatically updatable.
HINT: To enable updating the view, provide an INSTEAD OF UPDATE trigger or an unconditional ON UPDATE DO INSTEAD rule.
It is this limitation that I’m interested in lifting as part of work to progress row-level security for the AXLE project. Kohei KaiGai has done some great work with row-level security, and features like security_barrier and LEAKPROOF have largely arisen out of his work toward adding row level security to PostgreSQL. The next challenge is how to deal with updates on a security barrier securely and in a way that will be maintainable into the future.
Why subqueries?
You might be wondering why we have to use subqueries for this. I did. The short version is that we don’t have to, but if we don’t use subqueries we instead have to create new order-sensitive variants of the AND and OR operators and teach the optimizer that it can’t move conditions across them. Since views are already expanded as subqueries, it’s much less complicated to just flag subqueries as fences that block pull-up / push-down.
There’s already a short-circuiting ordered operation in PostgreSQL though – CASE. The problem with using CASE that no operations may be moved across the boundary of a CASE, even LEAKPROOF ones. Nor can the optimizer make index use decisions based on expressions inside a CASE term. So if we used CASE as I asked about on -hackers we could never use an index to satisfy a user-supplied qualifier.
In the code
security_barrier support was added in 0e4611c0234d89e288a53351f775c59522baed7c. It was enhanced with leakproof support in cd30728fb2ed7c367d545fc14ab850b5fa2a4850. Credits appear in the commit notes. Thanks to everybody involved.
Front page feature image is Security Barrier by Craig A. Rodway, on Flikr
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633