Why relational databases are in your future
Prof. Dr. Michael Stonebraker
In the time when Ted Codd and Chris Date finished their work on Relational Database Theory, Prof. Dr. Michael Stonebraker also got inspiration from this new way of handling data. It is approximately at the same time that Larry Ellison took relational database theory to craft the 2.0 version of his solution for the Oracle-project that the CIA was seeking help with.
Michael Ralph Stonebraker is a computer scientist specializing in database systems. Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational databases. He is also the founder of many database companies, including Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Tamr, Vertica and VoltDB; he also served as chief technical officer of Informix. For his contributions to database research, Stonebraker received the 2014 Turing Award, often described as "the Nobel Prize for computing."
Stonebraker's career can be broadly divided into two phases: his time at University of California, Berkeley when he focused on relational database management systems such as Ingres and Postgres, and at Massachusetts Institute of Technology (MIT) where he developed more novel data management techniques such as C-Store, H-Store and SciDB. Stonebraker is currently a Professor Emeritus at UC Berkeley and an adjunct professor at MIT's Computer Science and Artificial Intelligence Laboratory. He is also known as an editor for the book Readings in Database Systems.
I am extremely proud and grateful to have had Dr. Stonebraker speak at Postgres Build 2021. It is based on that presentation that I am writing this post, for your enjoyment.
Relational database theory
A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems have an option of using the SQL (Structured Query Language) for querying and maintaining the database.
To be brutally honest, very few database management systems in the world are true relational database management systems. This is simply because the theory of relational database systems is so strict that you would end up with a system that for many purposes is not really usable.
That being said, the mainstream general purpose relational database systems, like IBM's DB2, PostgreSQL, and Oracle are very well versed in the best practical implementation of this theory. The cool thing about PostgreSQL is that it is the only (community) open source project that truly belongs in this category!
The Great Debate
Step back in time to May 1974.
On the stage there are two other Turing Award winners; Ted Codd and Charles Bachmann.
The topic of the discussion: which is better, the network (also known as hierarchical or document) database or the relational database?
The following snippet is taken from Chis Date's book, "Fifty Years of Relational, and Other Database Writings."
The Great Debate goes too far to dive deeply into it here, but please feel free to order yourself a copy of Date’s book to read up on it. It will suffice to say that The Great Debate—which was, mind you, back in 1974!— decided that Relational databases are better than Network databases.
Let’s translate this result to today: Use Postgres! In Postgres, you can natively run relational workloads combined with a document approach (such as JSONB) fully transparently on the same information.
The main purpose of this writeup is to highlight two points that Dr. Stonebraker made during his presentation.
During the opening keynote at Postgres Build 2021, Bruce Momjian discussed the concern of what the world for Postgres might look like in a possible "Post-relational era." Or when we get to a time that data management and data science have changed in such a way that Relational Database Theory would no longer be necessary, how would Postgres cope?
When asked, Dr. Stonebraker's answer was clear: there will be no post-relational era. I tend to believe that. My two basic reasons for this are:
- With today’s explosion of data, much of the marketing messaging is around special purpose data management solutions. This is true, I do not doubt that. What nobody is really saying, or is conscious of for that matter, is that to make sense of everything that is dealt with in these new solutions, you still need a system like Postgres in the center, tying everything together.
- The relational database approach has been pronounced dead several times. We all remember the rise of Big Data and of Hadoop with its map reduce technology. This would most definitely replace relational database technology! Today, Big Data has settled in its rightful place at the table, playing in the game that is discussed in the previous point. Why? Because to make sense of data, to analyze data, you will need SQL and for SQL you need structure in related data.
Business logic in the database
The second point is, if possible, even more dear to me.
Thou shalt put thy business logic in the database, close to the data.
Dr. Stonebaker approached this topic from the performance point of view. Doing a cursor-based approach is simply "too expensive." Collecting, let's say, 10,000 records and copying them over to do data processing in the application layer, which is outside of the database, calculating 1 or 2 results, and then dismissing the rest of the data, is very similar to taking a roll of notes of $100 and setting them on fire to light a single $0,10 candle. It makes no sense.
Creating an API-layer of stored procedures, triggers, and what have you, to simply collect your 1 or 2 results and transport them to your application layer, makes infinitely more sense.
Additionally, in today's world where data quality and data security are of utmost importance, it is also vital to make sure that data integrity and validation rules are literally next to where the data lives. Using the proposed API-layer approach will allow you to maximally secure this new gold that data has become.
Finally, when using a system like Postgres in a FaaS approach, It is important to have data management and data governance implemented through a procedural API layer. It is the only way to build and leverage this strategic architecture in a truly loosely coupled manner.
This must-see presentation from Mike Stonebraker provides a strong confirmation that a number of topics we always felt made sense actually do—without argument.
Relational databases are here to stay and you can use their full potential as a data governance platform by structuring the way it yields results.