The morning begins with a new email, with this subject line at the top of the inbox of you, the trusty DBA. The database isn’t down, though, otherwise you’d have known about it hours ago, and from more than this single email.
It's up—just not for everyone.
One quick glance at pgadmin tells you the database is actively serving connections--so it’s certainly not down for everyone. So what gives? You talk it over with the sender, Kerri. Kerri used to be part of the HRMS application platform, and you know each other reasonably well. Recently, Kerri joined a new team developing an integration engine for the company’s services billing platform. So, what are the symptoms?
- Ping requests from the team’s Proof of Concept environment to the database are honored.
- All other applications and users are able to access the same database, “team_resources”.
- Last week, the team was able to access the database just fine.
- Today, all database connections for all team members using the Proof of Concept application are now refused.
So, what do we know? We know that for all other applications, things seem to be working just fine. We know that the environment works overall, at least for Layer 3, since ping is ICMP, one of the IP-layer protocols. We know that this was working, only last week. Finally, we know it’s not working anymore.
What changed in the database environment? The user, application, or database?
What changed? The key to finding the problem lies in not just the question, but the troubleshooting itself. You first look for changes. Were drivers updated or reconfigured? Were connection strings changed? “No, nothing changed,” says Kerri.
You know the problem is reproducible and consistent--it did work, it doesn’t work, and there is no variation yet. So you ask Kerri to establish a connection to the database.
You can see in the logs on the application server that the connection is attempted to the database server via 192.168.1.14/5432, but refused.
You look in the pg_log, and and find…
FATAL: no pg_hba.conf entry for host "192.168.1.2", user "kerrismith", database "team_resources", SSL off
Hmm. Well, that’s interesting. According to this, Kerri doesn’t exist. So, either Kerri is a figment of your imagination, or we have a clue. You know Kerri exists, because your fellow DBA’s, Janna and Karl, were mentioning the same email over coffee this morning. For the sake of posterity, you’ll assume Janna and Karl aren’t part of a larger hallucination either.
You now have a clue to work with. Your next step is to take a look in the pg_hba.conf. What do you see there?
local all trust host team_resources 192.168.1.4/32 ldap ldapserver=ldap_us.acme.com ldapprefix="cn=", ldapsuffix=",ou=hr, dc=acme, dc=com host team_hourly_costs 0.0.0.0/0 ldap ldapserver=ldap_us.acme.com ldapprefix="cn=", ldapsuffix=", ou=hr, ou=finance, dc=acme, dc=com
LDAP Configuration & Relation with Postgres
Ah ha! A difference. You have LDAP configured for the teams, and for this particular master, you have one overlapping Organizational Unit, or “ou”, but one which is only able to access the database “team_hourly_costs.” (“ou” is Organizational Unit in LDIF-speak, some more on that here.)
Does this matter?
You send an IM to Kerri, for two reasons:
You want to know if she’s aware if anything was changed about her LDAP credentials lately.
You remember that the last symptom was that all team members in the new platform are now seeing this problem--and based on the pg_log, you only see one type of LDAP error, refusing the user kerrismith. This may mean that the connection is hardcoded to use only Kerri’s credentials, which is never a good security practice.
You then decide to make this pretty simple. You ask Kerri to access the ldap server itself from the host application, so she can test it against her own credentials:
ldapsearch -x -LLL ldap_us.acme.com -D email@example.com -W -b "OU=finance,DC=acme,DC=com" -s sub "(objectClass=user)" givenName
Kerri’s confirmed it—it works! So you know she has LDAP working right, and you know that something changed. You confirm with HR and your InfoSec teams that the database “team_resources” is acceptable to be visible to her new group in Finance, and you make one change to the pg_hba entry for the table:
local all trust host team_resources 192.168.1.4/32 ldap ldapserver=ldap_us.acme.com ldapprefix="cn=", ldapsuffix=",ou=hr, ou=finance, dc=acme, dc=com host team_hourly_costs 0.0.0.0/0 ldap ldapserver=ldap_us.acme.com ldapprefix="cn=", ldapsuffix=", ou=hr, ou=finance, dc=acme, dc=com
You test once more, and this time, the connection succeeds, and the queries are executed successfully. It looks like there was a lagging change to her Organizational Unit in LDAP which gave a false sense of success--when she was removed from the “hr” ou in a scheduled job overnight, the connections began to fail.
Fixed the Postgres & LDAP Authentication Problem
Now that you’ve found the likely culprit, you’ve asked Kerri to change the connection string to make the application leverage each user’s credentials, and test once again. Suddenly, you see another three successful connections, but this time, with different usernames. Another one in the books, job well done. The InfoSec team was also thankful for catching the delayed authentication changes and is working with the LDAP team to fix the workflow from a cronjob to a push mechanism for all user transfers.
In one day, you’ve managed to help one internal customer and fix four holes. Well done!
There are many, many caveats to this in the real world; LDAP can be authenticated multiple ways--with StartTLS, with LDAPS starting in PostgreSQL 11, or at the very least, with SSL. This, of course, implies that you’re using LDAP, rather than PAM, which we’ll talk about in a later article. You can also check out an article written by Richard Yen on another specific use case, for LDAP and StartTLS, right here.
Join Postgres Pulse Live
Our example excluded all of the caveats just listed, because these are lengthy subjects in their own right, but this example was based on a real problem we solved--and one you might have faced. Join us on Monday, April 13th, for our next Pulse Live Session! We’ll continue to talk through some of the nuances mentioned above--and we encourage you to bring any questions you have to the conversation. Feel free to send them to firstname.lastname@example.org ahead of time.
Jamie Watt is Vice President of Global Support Services. In this role, Jamie leads Technical Support, Customer Care, RemoteDBA Services, and EDB’s online community platform of PostgresRocks.Com. Jamie brings over twenty years of experience in the industry, with organizations ranging from startups to the F500 and G2000, in roles ranging from system administration and support to data analytics and process design.
Previous to EnterpriseDB, Jamie’s most recent role was in leadership with Dell EMC, as an original member of the Equallogic acquisition; there, he was instrumental in development and delivery of a best-of-breed global Support program for Dell’s storage portfolio, as well as a key driver in UX/UI evolution for mass-scale Services tools - all with the single focus of driving better customer experience.