Standalone NoSQL database solutions are easy to use, fast to deploy, and make getting an application up and running a piece of cake. That’s the sweet part though. What comes later, however, is a bellyache of solution management challenges, risks to data integrity and a loss of control in the data environment.
Developers have been fast to adopt standalone NoSQL solutions because they are very easy to integrate with a Web 2.0 development style. If you’re working in Python or node.js, and, for example, you develop against MongoDB, it’s quite easy to start storing your data in the database. You don’t need to know anything about SQL; you don’t need to know anything about tables, foreign keys, etc. All you really need to do is enter a command such as “Add x to collection,” and you’re done. Developers using standalone NoSQL solutions can also get started without waiting for a database administrator (DBA) to create a data model. The data model is developed as the application matures.
No SQL Can Mean No Controls
For developers, working in a schema-less environment is very agile and sprint-friendly. But because it’s schema-less, and because it’s so easy to set up without a DBA and without having to know much about tables in SQL, applications built with standalone NoSQL solutions inevitably create data silos. That is because NoSQL databases cannot perform the data processing and integrity management that is typically done by a relational database. Instead, this must be handled in the application. That makes developing the application more complex and pushes data logic into the business logic.
In addition, developers building processing into individual applications can lead to a breakdown in corporate standards, further contributing to data silos. For example, if your corporate data standard is that a last name can only have 50 characters, you can’t enforce that at a database level with a NoSQL-only solution. You can write it all over the application, you can put all the checks in, but you can’t add the necessary constraints and information to the database itself, which is where it belongs. The lack of integration with corporate data standards is an issue that appears later on in the lifecycle once the application starts becoming successful. It’s not something you see during the proof of concept or early development stages. Gartner has warned that this is going to be a real problem, and that “By 2017, 50% of data stored in NoSQL DBMSs will be damaging to the business due to a lack of applied information governance policies and programs.”¹
Standalone NoSQL solutions can also create technology silos, because you’re bringing in new database technology. For a developer, that may not seem like a big deal. But for DBA,s the new application must be backed up, archived, replicated, tuned, maintained, etc. The result is a lot of hidden costs and potential risks, all because the initial decision was driven entirely by developer preferences.
Applications in Context
The data models of NoSQL-only solutions often include the data access path, which basically means that the data model is tuned towards one single use case. This means that the data model isn’t capable of supporting other use cases, leaving you with no path forward.
For example, if you have a successful solution that registers visitors to a conference, at some point in time somebody will say, “We’re going to ship them a welcome package. Oh, and by the way, the names and addresses all need to meet the corporate standards because otherwise the orders can’t flow into the warehouse.” This is the point where the application now begins existing in context. And if you have no easy way to integrate it, you have a big problem.
Every application I’ve ever worked on always at some point had to link to postal code tables, ISO country codes, building materials, pricing files, or other data standards and structures. This isn’t easy when there are important applications utilizing a standalone NoSQL solution, which don’t support foreign keys or relational models.
In a document-oriented approach, it is compelling to hardcode one data access path, such as what the customer ordered, when they ordered it, and how much they ordered. However, things can become problematic when the use case changes and a different data access path is required. It won’t take long before someone wants to use this data for another purpose. Someone might want to find out who ordered a certain product in a specific month, perhaps due to a recall or for market research purposes. But because the use case was encoded in the data access path and is an inherent part of the data model, you can’t support the new use case.
Not-Only-SQL: What Postgres Brings to the Table
Postgres offers you a “Not-only-SQL” solution, providing a platform that allows structured and unstructured data to harmoniously co-exist. Inside Postgres, JSON and ANSI SQL are a natural fit and very well integrated. It’s the same language and the same ACID-compliant transaction framework. If you have a JSON document that represents a contact, and you store some other properties as SQL data elements, and you commit that contract with these SQL data elements, it is all one transaction. The document will get stored and the properties will get stored at the same time. You never have to worry about whether your database got updated but your document didn’t get stored, or vice versa. And this applies to your backup, replication, recovery, and everything else the DBA does.
Both SQL-only and NoSQL-only are too limited; Postgres gives you the best of both worlds. When you have data that has high variability, you can use JSON. When you have structures and relationships, you can use SQL. This gives you the ability to design for re-use, and you don’t have to hard-code your use case into the data model. You can have a DBMS platform that integrates well with your IT environment, backs up efficiently, can be easily monitored, and does SNMP and all the other things that are necessary for success.
Marc Linster is Senior Vice President, Products and Services, at EnterpriseDB. ¹ Dec 3, 2013 Research Note, “Does Your NoSQL DBMS Result in Information Governance Debt?” by Nick Heudecker and Ted Friedman.