Open source and purely commercial databases have been living together for a long time, but the open-source model continues to gain more and more momentum. The reasoning in favor of open-source is strong but varying, depending on the license model any given project follows. You could be leaning toward something with more diversity in innovation; you may be seeking more democracy in code influence; or, you might just simply be looking to have more control over your costs. My colleague Marc Linster, SVP of Product Development at EDB, talks about those models, values, and challenges in his recent publication on Computer Weekly.
PostgreSQL itself has been available for nearly twenty-four years and is regarded as a highly-responsible and stable product. However, though you’ve already made the decision to use PostgreSQL, it’s key to understand that the freedom of open source and PostgreSQL is not defined by a freedom from responsible actions. Some of those are clear, while others may be only discovered the hard way. Let’s look at one example we experienced as EDB Support.
PostgreSQL Support: A Customer Case Review
Last year, we received a case from a customer who was facing intermittent problems in their production environment. They’d been a customer of EDB for several months and hadn’t reached out to our Support teams yet, so we were coming in cold.
So, what sort of problems? At times, they would see application performance go from “normal” to “bad” and back again. Sometimes this would take five minutes, sometimes it would take five hours. The time of day wasn’t consistent, nor was the number of users.
The Usual Suspects in PostgreSQL Database Problems
- PostgreSQL monitoring tool use?
- Database log files?
- PostgreSQL configuration files?
Though the customer was in production, they had very limited monitoring in the database environment. They did happen to have log files, which one would hope, so that was good. They also sent us a copy of their postgresql.conf and pg_hba.conf.
We didn’t see much that would cause this sort of scenario. The usual parameters seemed to be set appropriately, both for the parameters which serve live connections, as well as the parameters serving maintenance operations.
The customer had reached out in a state of urgency, and hadn’t yet provided the version information on Postgres itself. We instructed them to use the postgres command.
$ postgres -V postgres: command not found
Hmm. Well, that sometimes does happen, most commonly because of the binary not being in the system PATH itself, but there’s more than one way to get the answer.
$ locate bin/postgres /usr/lib/postgresql/9.4/bin/postgres
9.4 itself was released in 2014, but there weren’t many situations which were common in which this sort of behavior would present itself. So, we dug a little deeper by taking a step back.
Is it just the database?
This is a running theme, and critical to solving problems in technology: when faced with a problem, consider the simplest answers first. Is this really a database problem, or is the database just an unfortunate citizen of an environment which is experiencing problems either at a host or network level?
“Are you monitoring the environment itself?” They were using a SolarWinds implementation. They saw no significant problems there, at least for the most part:
- The network averaged about 35% of capacity, and peak load only saw about 85% use
- Other hosts in the same environment were reporting normal behavior
- The application hosts looked to be in good health— memory consumption was less than 25%, storage was at 53%, and the NIC’s weren’t overloaded for transmit or receive.
- Interestingly, the applications affected only behaved this way when they were directing traffic to and from their PostgreSQL databases.
- Cached information continued to perform well, even during the times when the database traffic was poor.
- The CPU load on the database host was, during these times, close to 53%.
- The memory consumption during these events was constantly at 100%.
9.4 is older, but it was also worth knowing what maintenance version of 9.4 they were running. Defects are fixed constantly in Postgres, and it was possible they were experiencing any one of a number of rare-but-possible memory conditions that existed in the early versions of 9.4, but were fixed in later maintenance releases.
Back to the Command Line
Now that we knew the path, the command we’d had the customer use earlier could be used in the path itself.
$ /usr/lib/postgresql/9.4/bin/postgres -V postgres (PostgreSQL) 9.4.16
Being one of the mid-stream releases of 9.4, this narrowed down our options greatly for known problems with the code, so we went back to the first “rough edge” in the case.
How was this installed?
PATH issues aren’t so rare as to be unheard of, but the most common reason for the failure of the postgres -V command is the installation.
This, in fact, was the culprit here; the customer’s original DBA chose to compile PostgreSQL themselves, rather than leveraging usual yum, tarball, or binary options. They sourced their extensions independently, rather than making use of the list of extensions distributed normally in the source code. There wasn’t any meaningful documentation to detail the versions, configuration, or any changes which may have been made to the database code itself. While the last point was rather unlikely, the former meant that we had a patchwork of code and extensions with no real frame of reference for reliability.
Ultimately, we asked them to perform a dump and restore to a freshly-created installation of PostgreSQL from the community’s yum repositories. This allowed them to escape not only the questionable source compilation but also the extension challenges and one other piece we didn’t yet mention: upgrades. When they did this, they did install 9.4, but the yum install applied the most recent version of 9.4 (which, at the time, was 9.4.20). Upgrades were then simplified, and the codebase was safer with no gaps in extension currency.
Once they were up and running in the new instance, the symptom no longer occurred. There were multiple theories— they did identify crosstab user behavior was leveraging heavy use of an older build of tablefunc, and there were a small number of rare defects which were fixed in the maintenance release they upgraded to; however, with no smoking gun and an unreliable deployment, the fastest way to a stable environment was to rule out the integrity of the database itself.
Postgres Support Isn't Just Insurance
There are many organizations that are self-sustaining in their use of PostgreSQL, and some even contribute to the community, either in the form of the database itself or in extensions and other tools. If you have the resources and the discipline to share the responsibility of the database integrity beyond just configuration, you can absolutely be successful without the assistance of another vendor.
It all depends on your goals. Many organizations prefer to adopt the excellence and experience of a vendor whose business is Postgres, rather than to build their own wealth of skill and experience. This may allow your development teams and IT organizations to focus solely on your product and your customer goals. When you seek a vendor’s help, their #1 goal should be your success. It’s the only metric that matters to you.
The recipe for success shouldn’t vary from one credible vendor to the next:
- Help you achieve a responsible implementation with high availability and performance
- Ensure that the software you’re using is predictably safe and enterprise-strong
- Support you not only when things go wrong, but when questions arise.
This last one is perhaps the biggest component; the right vendor will support your questions with advice and guidance; this actually helps to avoid situations like what we described above. “Can I compile my own source?” is an answer which has both answers and recommended choices.
In the case I’d outlined above, things had gone horribly wrong from the start— they just didn’t know it for a long while. The customer took on the responsibility of compiling their own code, but didn’t maintain a body of knowledge as to how they did so or how to maintain it. Without a vendor there to guide them, they’d unknowingly taken risks in their stability, performance and integrity. However, it took only a single interaction with our technical support team to help guide them to a healthy deployment.
While I’m preferential to our delivery--as are our customers, who have declared Technical Support as one of the strongest reasons they engage with us--EDB isn’t the only vendor available. Others exist, though their scale and focus may be very different from EDB’s global presence, and the decision to obtain a vendor is the most crucial action for you to take in your journey with PostgreSQL.
Join Postgres Pulse Live!
Our goal is to continue to leverage our experiences with our customers to bring meaningful insights for all PostgreSQL users, both in the problems themselves as well as in the approach we take to the solution. Looking for more help just like this? Here you can find all things Postgres Pulse, including all of our blog posts and our YouTube series.
Join us on Monday, June 15th, for our next Pulse Live Session! We’ll have a special guest, Jan Karremans, joining us as the first in a series of session guests, and we’ll go through the usual questions and opinions related to this week’s topic. Remember, we’re open to any questions you may have around PostgreSQL at all. You can ask your questions via email at email@example.com, hashtag on Twitter, or during the Postgres Pulse Live session right here.
Jamie Watt is Vice President of Global Support Services. In this role, Jamie leads Technical Support, Customer Care, RemoteDBA Services, and EDB’s online community platform of PostgresRocks.Com. Jamie brings over twenty years of experience in the industry, with organizations ranging from startups to the F500 and G2000, in roles ranging from system administration and support to data analytics and process design.
Previous to EnterpriseDB, Jamie’s most recent role was in leadership with Dell EMC, as an original member of the Equallogic acquisition; there, he was instrumental in development and delivery of a best-of-breed global Support program for Dell’s storage portfolio, as well as a key driver in UX/UI evolution for mass-scale Services tools - all with the single focus of driving better customer experience.