Feb 23, 2021
In my most recent blog post, I provided a review of a book soon to be published, entitled “PostgreSQL Query Optimization” by Henrietta Dombrovskaya, Boris Novikov, and Anna Bailliekova.
I highly recommend this book. It includes many great concepts about the PostgreSQL query optimizer, presented alongside an excellent set of recipes for optimizing queries.
In this post, I’ll cover one of the chapters I found especially interesting. It is about a programming methodology called NORM.
NORM is short for “No ORM”—which is short for “No Object Relational Mapping” technology.
I will start by saying that I have only done minimal application development using this methodology and have done no performance testing. However, the concept has me very excited and I have absolute faith in the experience of the book authors who have used this design pattern in production.
I also admit that there are many ORMs that I have never used.
I am hoping to do a decent amount of development over the next several months using NORM. If things go as I hope and as I expect, I will present my findings in a future blog.
In summary, when reading one of the chapters in the book, I had one of my epiphanies about application development that seems to come to me every decade.
New mindset, New results
The authors introduced a concept called NORM, which is short for “No Object Relational Mapping.”
So, let’s start by introducing why Object Relational Mapping technologies, such as Hibernate, exist for Java, or why SQLAlchemy ORM exists for Python.
It comes down to two important truths:
- Relational Databases are here to stay.
- Object Oriented Programming languages are here to stay.
These two technologies have fundamentally different means of representing application data.
Relational databases are here to stay
My evidence for this? Please see DB-Engines database rankings. The results will vary a bit, depending on when you click on this URL. However, as of this writing in February 2021, four of the top five (including the top four) databases are relational databases.
- SQL Server
Incidentally, what database gets DB-Engines database of the year based on year over year growth? PostgreSQL! Not Mongo, not Hadoop, not Casandra nor any of the other 250+ plus databases on this list. While we all know PostgreSQL is a lot of things, its foremost use case is as an awesome relational database. Although later in this article you will see one element of why it is more than that.
I spent a long portion of my career working towards proving that point #1 was not true. During the early to mid-90s, I was part of the Object Database industry. We believed that the application development world would converge on a database that stored data the way applications represented and manipulated data. This premise was my passion, and making it happen was my undying mission.
But alas, it did not happen. Object Databases found a small home for certain datasets and certain use cases, but they never became the first choice for business application persistence.
This shortcoming of my mission was due to the fact that business data is best represented in a relational model and third party tools know how to work with this model. Personnel records, payments, orders, and order items fit very nicely into tables and columns. SQL is a great database independent language for examining relational data.
In addition, tools like Crystal Reports and Tableau know how to present and analyze relational data. The buzz around NoSQL databases has diminished in recent years. Yes, there are use cases for it, however the majority of most new database deployments are done on relational databases. In addition, most new applications are developed against existing relational databases.
Object Oriented Programming Languages are here to stay
Now, let's flip over to the other side of the equation. Object Oriented Programming Languages started to take hold in the late 80s with the introduction of C++ and, to a lesser extent, SmallTalk. Object Oriented programming was the new math. All developers were desperate to learn it. To overcome the shortcomings of C++, Java followed. In the hype surrounding Java, conferences were held, books were written, T-Shirts were made, universities changed their curriculums to Java.
Anybody not using OO methodologies, and who were attempting the development of new business applications by any other design pattern, were shunned as heretics of the new revolution.
For further evidence for this, please see the Tiobe index. This is a very popular index for tracking the success, adoption and overall market perception of programming languages. Again, the results will vary depending on when you access the link. As of February 2021, here are the top four programming languages:
Four of the top five programming languages at the time of this writing are object oriented programming languages. I admit, I would have a more compelling argument if the number one ranked language was not C. For many years, the number one language was Java. I haven’t studied why that has changed.
The Great Impedance Mismatch
So, Object Oriented programming is here to stay and so are relational databases. The truth is, they “think” of things differently. It fundamentally comes down to three different worlds:
- Navigational Access (Object Oriented paradigm) vs. Associative Access (Relational Database paradigm).
- Declarative programming languages (Relational Database Access via SQL) vs. Imperative programming access (Object Oriented Program).
- A hierarchical object model object vs. a flat table store.
Navigational access means: Get me what I want by following a reference or a pointer. More specifically, an “Order Object” has a reference to it’s “Client Object." An “Order Object” also has a reference to its collection of “Order Items.” The programmer “navigates” between objects following pointers and references.
Associative Access is the concept of retrieving an item or group of items that meet a certain criteria. For example, give me the order “associated” with this client id. Give me the order Items “associated” with this order.
Here is an example I used back in the day when I was selling Object Databases. Say you are standing in front of the Vietnam War Memorial in Washington DC. Somebody comes up to you and says, “I understand your friend died in the Vietnam War.” Can you tell me where his name is listed? Navigation access… you point to his name. Associative access... I give you his name and you use the alphabetical order listing to find it.
On top of Navigational vs. Associative access, Object Oriented programs are developed imperatively, relational databases are accessed declaratively. In simple terms, OO programs do what you tell them to do (imperative). SQL gets the results you want and it will decide how to do it (declarative).
To illustrate the Object Oriented model vs. the relational model more concretely: when an application has an Order Object to store in the database, it is not as simple as just adding one row to the Order table,—because the OrderItems belong in a separate table. In a more complex scenario, an application Order Object may have sub-classes like ExpressOrder or International Order expressed through inheritance relationships. The inheritance model needs to be “flattened” into tables.
This combination of issues creates what is referred to as “The Great Object Relational Impedance Mismatch.” OO programmers are stuck in one world DBAs and database developers stuck in another. This can become the source of two major business issues:
- Contention between the teams.
- Business applications that are either hard to maintain or perform horribly as a result of one model overpowering the other model.
In my opinion, the most prevalent formula for enabling both of these is the use of ORM technology. An ORM enables programmers to turn the interface with the database into a set of classes. Oftentimes, an ORM tool is pointed at an existing database and out comes a set of classes that the programmer can use to develop his business application.
This works well for a small data set but when the database schema begins to change, performance and maintainability issues start to creep in. Users start complaining about performance issues, getting features added to the application takes longer than expected and depending on organizational dynamics, the database team and the application development team wind up blaming each other.
Here some are the contributing factors to this issue:
- Usually is not always a 1 to 1 match between the application classes and the database schema.
- When there is an application object that does not have an exact match with the database table, the object graph is loaded lazily. More specifically, the entire object is materialized when accessed. Stated another way, ORMs only load the portions of the object graph as they are accessed. So for a complex object graph many small queries are issued. This is referred to as “the N+1 problem”. I have been told that some ORMs give you ways to avoid this problem but it does require that the programmer know how and when to employ them.
- There are times when the SQL generated by the ORM is sub-optimal and very hard to debug. Even if you can debug it, getting the fix into the ORM can be very difficult.
The concept is brilliant in it’s simplicity. It might need a cooler name though. A subject of a future blog I hope.
Effectively, there is a contract defined between the database and the application. The contract is expressed between these two layers as a JSON document. The application will call a function with these parameters and you will send me back a JSON object graph that gives me exactly what I am looking for in one trip. So the model for retrieving an object looks like this:
And the workflow for storing an object would look like this:
Note, the data is not stored in JSON format. It is broken down into its native table format via the PL/SQL functions. This enables us to keep the full power of SQL with things like constraints, specific data types and foreign keys. It also allows third party reporting tools to access the data with standard SQL. However, if you want to store the data as JSON object instead, and give up the previously described benefits of using the relational model Postgres has excellent support for native JSON data types.
This approach was not possible during the rise of Object Oriented languages. JSON did not exist in any adoptable form for either database or applications. However, now most object oriented programming languages have excellent support for serializing and de-serializing objects to and from JSON. In addition, the support for JSON is Postgres is absolutely outstanding. Perhaps better pure JSON database. Postgres was built from the beginning to be extensible for new Data types and this has allowed the community to thoroughly embrace JSON. Postgres has such an advantage over other databases which are pure document stores. All these databases wind up attempting to layer SQL on top of whatever underlying model exists. It is very difficult for this approach to be successful.
So why am I so excited about NORM?
- It allows the application and the database to develop an exchange contract, at the same time allowing both to harness the full power of their model. The database developer is free to design the database the way it should harness the full power of the relational model. The application developer / architect is able to use his full O-O skill set and design patterns. All the two teams need to do is work together to identify a contract in the form of a JSON document.
- All though I haven’t had the chance to observe them first hand, I am optimistic about the potential performance benefits.
- Now your data access methods are in a single location and can be versioned and tuned as your data evolves over time.
- You are not dependent on the ORM to generate the correct SQL.
In having this blog reviewed, a few people commented that they were unconvinced. Specifically, they stated that JSON conversion is slow on both the application side and the database side. However, those who have used NORM state that the primary benefit is that they can significantly reduce the number of round trips to the database from the application. In their cases, they allow both the application developer and the database developer to use the full power of their platform. In addition, they significantly reduced the number of round trips to the database by developing a JSON document contract for the exchange between the application and the database.
If you are interested in further information about NORM please check out the github project here, and watch our on-demand webinar to learn how to power JSON with PostgreSQL!