How to create and refresh data for materialized views in PostgreSQL

Linux x86-64 (RHEL 8) 13 Agent

Richard Yen Principal Support Engineer

SUMMARY: This article introduces MATERIALIZED VIEW in PostgreSQL and discusses its functionality with examples. Topics covered include:

1. VIEW v. MATERIALIZED VIEW

2. Refreshing a MATERIALIZED VIEW

3. MATERIALIZED VIEW and indexes

 

Most relational database systems provide the functionality to create a VIEW, which basically acts like a shortcut or macro. Consider the following query:

postgres=# SELECT * FROM roster;

 id |       name       |      birthdate      

----+------------------+---------------------

  1 | John Smith       | 1980-05-03 00:00:00

  2 | Jane Brown       | 1970-04-24 00:00:00

  1 | Jimmy Peters     | 1990-01-14 00:00:00

  2 | Janice Patterson | 1993-08-27 00:00:00

  3 | Joe Washington   | 1992-11-12 00:00:00

(5 rows)

If we peek under the hood, we discover that “roster” is a VIEW that combines two tables together:




postgres=# \d+ roster

                                       View "public.roster"

  Column   |            Type             | Collation | Nullable | Default | Storage  | Description 

-----------+-----------------------------+-----------+----------+---------+----------+-------------

 id        | integer                     |           |          |         | plain    | 

 name      | text                        |           |          |         | extended | 

 birthdate | timestamp without time zone |           |          |         | plain    | 

View definition:

 SELECT teacher.id,

    teacher.name,

    teacher.birthdate

   FROM teacher

UNION ALL

 SELECT student.id,

    student.name,

    student.birthdate

   FROM student;

 

VIEW v. MATERIALIZED VIEW

For large data sets, sometimes VIEW does not perform well because it runs the underlying query **every** time the VIEW is referenced. Take, for example, a view created on the pgbench dataset (scale 100, after ~150,000 transactions):

postgres=# CREATE OR REPLACE VIEW account_balances AS

SELECT a.* FROM pgbench_branches b

JOIN pgbench_tellers t ON b.bid=t.bid

JOIN pgbench_accounts a ON a.bid=b.bid

WHERE abalance > 100;

CREATE VIEW

postgres=# explain analyze select * from account_balances ;

                                                                                                

QUERY PLAN                                                                          

--------------------------------------------------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=4.61..26.04 rows=10 width=97) (actual time=831.184..21182.906 rows=563000 loops=1)

   ->  Hash Join  (cost=4.46..24.32 rows=10 width=101) (actual time=831.103..4844.011 rows=563000 loops=1)

         Hash Cond: (t.bid = a.bid)

         ->  Seq Scan on pgbench_tellers t  (cost=0.00..16.00 rows=1000 width=4) (actual time=0.023..7.406 rows=1000 loops=1)

         ->  Hash  (cost=4.45..4.45 rows=1 width=97) (actual time=831.042..831.048 rows=56300 loops=1)

               Buckets: 65536 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 7605kB

               ->  Index Scan using pga_abalance_idx on pgbench_accounts a  (cost=0.43..4.45 rows=1 width=97) (actual time=0.034..439.754 rows=56300 loops=1)

                     Index Cond: (abalance > 100)

   ->  Index Only Scan using pgbench_branches_pkey on pgbench_branches b  (cost=0.14..0.17 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=563000)

         Index Cond: (bid = t.bid)

         Heap Fetches: 563000

 Planning time: 0.526 ms

 Execution time: 24911.494 ms

(13 rows)

 

As you can see, it took over 24 seconds to gather the accounts with balances greater than 100.

PostgreSQL provides the ability to instead create a MATERIALIZED VIEW, so that the results of the underlying query can be stored for later reference:

postgres=# CREATE MATERIALIZED VIEW mv_account_balances AS

SELECT a.* FROM pgbench_branches b

JOIN pgbench_tellers t ON b.bid=t.bid

JOIN pgbench_accounts a ON a.bid=b.bid

WHERE abalance > 100;

SELECT 563000

postgres=# explain analyze select * from mv_account_balances;

                                                           QUERY PLAN                                                           

--------------------------------------------------------------------------------------------------------------------------------

 Seq Scan on mv_account_balances  (cost=0.00..11168.30 rows=193830 width=352) (actual time=0.141..3784.241 rows=563000 loops=1)

 Planning time: 0.072 ms

 Execution time: 7473.437 ms

(3 rows)

 

As you can see, a MATERIALIZED VIEW produces the result in just over 7 seconds (as opposed to 24 seconds), because it stores a snapshot of the data for users to work with.

 

Refreshing a MATERIALIZED VIEW

Should the data set be changed, or should the MATERIALIZED VIEW need a copy of the latest data, the MATERIALIZED VIEW can be refreshed:

postgres=# select count(*) from pgbench_branches b join pgbench_tellers t on b.bid=t.bid join pgbench_accounts a on a.bid=b.bid where abalance > 4500;

 count 

-------

 57610

(1 row)



— Some updates



postgres=# select count(*) from pgbench_branches b join pgbench_tellers t on b.bid=t.bid join pgbench_accounts a on a.bid=b.bid where abalance > 4500;

 count  

--------

 173400

(1 row)



postgres=# select count(*) FROM mv_account_balances WHERE abalance > 4500;

 count 

-------

 57610

(1 row)



postgres=# refresh materialized view mv_account_balances ;

REFRESH MATERIALIZED VIEW

postgres=# select count(*) FROM mv_account_balances WHERE abalance > 4500;

 count  

--------

 173400

(1 row)

 

MATERIALIZED VIEW and indexes

Indexes can also be created against a MATERIALIZED VIEW to make queries even faster:

postgres=# EXPLAIN ANALYZE SELECT * FROM mv_account_balances WHERE abalance > 4500;

                                                         QUERY PLAN                                                         

----------------------------------------------------------------------------------------------------------------------------

 Seq Scan on mv_account_balances  (cost=0.00..16729.38 rows=62109 width=97) (actual time=0.383..464.850 rows=57610 loops=1)

   Filter: (abalance > 4500)

   Rows Removed by Filter: 505390

 Planning time: 0.118 ms

 Execution time: 845.478 ms

(5 rows)



postgres=# CREATE INDEX mv_balances_idx ON mv_account_balances(abalance);

CREATE INDEX

postgres=# EXPLAIN ANALYZE SELECT * FROM mv_account_balances        WHERE abalance > 4500;

                                                              QUERY PLAN                                                               

---------------------------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on mv_account_balances  (cost=1096.13..11054.68 rows=58284 width=97) (actual time=4.719..389.851 rows=57610 loops=1)

   Recheck Cond: (abalance > 4500)

   Heap Blocks: exact=1947

   ->  Bitmap Index Scan on mv_balances_idx  (cost=0.00..1081.56 rows=58284 width=0) (actual time=4.471..4.478 rows=57610 loops=1)

         Index Cond: (abalance > 4500)

 Planning time: 0.412 ms

 Execution time: 766.624 ms

(7 rows)

 

Conclusion

As we can see, MATERIALIZED VIEW provides some additional features that VIEW lacks, namely in providing a consistent snapshot of data for users to work with and giving users the ability to index the underlying snapshot. However, MATERIALIZED VIEW is not for everyone—some users may wish to have the most up-to-date data with every call. Users should employ each type of VIEW in accordance to their needs.

 

Richard YenPrincipal Support Engineer