The adoption of Postgres within existing database infrastructures is rising, posing new questions and challenges for even veteran database professionals. At EnterpriseDB, we’re seeking to share stories from PostgreSQL users on their experiences, surprises and challenges.
Locately is a full-service shopper insights firm offering services in the US and Canada. Locately aims to replace the traditional in-person shopper intercept with a new big data solution that leverages the power of mobile, sending shoppers in-the-moment mobile surveys to understand what they do and why. Every day, the company needs to track over a million location points from participants’ phones in real time, and deliver tens of thousands of surveys. Needless to say, they need a reliable, enterprise-class database with geospatial capabilities.
The following is a conversation with Locately senior developer Josh Kupershmidt and software engineer Matt Klein about the experiences they’ve had with Postgres and why they chose Postgres to support all of their operations.
Does Postgres support all of your operations?
Josh: Yes, location tracking, analytics, who is visiting what point of interest and actions such as delivering a survey or doing something on the backend based on that information, which is all supported by Postgres and PostGIS. We get well over a million location points a day from our participants’ phones, and that’s going up all the time. And we’re delivering tens of thousands of surveys every day.
Why did you choose Postgres?
Matt: The original killer app would have been PostGIS. It has a very powerful spatial-enabled database. The open-source aspect was important. We were a small, up-by-the-bootstraps startup and we found PostGIS to be far and away the leader in the open source geospatial database realm. I think that still holds true today.
What other extensions are you using?
Josh: We use Pgbouncer to handle multi-plexing connections and keep active connections counts low on all these databases. We use pgBadger for log file analysis for performance analysis reasons. Pg_repack for bloat management. And some of the contrib extensions, like pg-buffercache.
PostGIS uses the flexibility of Postgres to define all sorts of custom types and operators, and being able to hook in to all these data types is important. When you’re working with spatial data you might be talking about a polygon representing a store or a line string that represents the path that someone took from point A to point B. PostGIS introduces a lot of data types that let you model this data effectively. Otherwise, you’d have a really hard time storing that information in a database, and an even more difficult time querying it efficiently.
What would you say are the 3 Postgres capabilities most important to Locately?
Matt: The first would be geospatial for sure. You can imagine how important it is to a location analytics survey-triggering engine to have the ability to know quickly and efficiently in real time what stores or establishments someone is near when those stores or establishments are places that we care about or our customers care about. One of the most important aspects that we use Postgres for is the ability to efficiently store and query on locations.
Second would probably be backup. Postgres has many different options for backup and disaster recovery, including hot standby, snapshots, pg_dumps, and we actually use all of these methods in different ways. It’s great to have so many different methods of backups and disaster recovery. Part of the benefit of this flexibility is that developers can run an environment on their local machine that’s extremely similar to what we run in production. If you’ve ever tried to get Oracle running on a Macbook, it’s hours of pain, not to mention licensing issues as well. All of that stuff is a non-issue with Postgres. We can spin up snapshots of our production database directly in the cloud belonging to an individual developer to access as his own primary database, and they can use it for an hour and then blow it away. I think all of those things would just be an enormous headache to get right on some proprietary databases, if it would even be possible at all.
And third is reliability. We need to make sure our data is safe. When we store something, Postgres is validating that it’s the right type. When we commit something, we know that it’s stored on disk and it’s safe. If there’s a crash or a failover we know what’s going to happen.
Can you share some dimensions around concurrent users, availability requirements, disaster recovery requirements, etc.?
Josh: Availability-wise, this is an API that mobile devices are checking in with. Having our primary and secondary databases online all the time is very important to us, as we try to ensure high availability. We’re dependent very much on Postgres to record all this data coming in and to perform data analytics. If our databases are down, our APIs are down and we’re dead in the water. So we do hot standby replication, we have a primary and usually two standbys up and running at all times, and failover in case something bad happens to the primary. We’re also able to offload a large chunk of our read-only work onto these standbys to not have so much load going onto the primary database.
What are you using for replication?
Matt: We use the built-in binary replication that Postgres provides. We use asynchronous streaming replication. We have one primary database where all the writes are happening, and we have multiple read-only, stand-by databases, which are available to be promoted to primary if the primary fails. Also, those read-only standbys are available to service read-only queries. From a scalability perspective, we can add as many standbys via hot stand-by replication so that if something happens on the primary, we can have little or no system downtime.
If you were talking to a colleague, is there anything they would want to know that we haven’t talked about when it comes to Postgres?
Josh: The community is something we really like. That’s another huge plus that Postgres has that I don’t think the other guys really match. It’s an enormous ecosystem of extensions and people that are building tools to make your life as a DBA easier. It’s very healthy and active with a strong focus on a lot of the things that are important to us like reliability and dependability. A common theme at PG Con this year was making sure that no more hot stand-by bugs creep into the latest releases without anyone noticing, as well as how to properly handle DR scenarios in case something does go wrong. Everyone in the community seems to be thinking about these issues and how to address them.
Were there any challenges or surprises or gotchas that came up?
Matt: We are hosted on Amazon Web Services and we use disk volumes through Elastic Block Store, and there have definitely been some challenges along the way there in building a robust, scriptable environment. We were pretty early adopters on AWS and EBS. I think in the early days of EBS there were more reliability problems and more performance variations, and there were some real ugly hardware problems. But EBS has really gotten quite reliable in the last year and a half or so. Provisioned IOPS volumes went a long way. I would say the challenge was getting good Postgres performance and reliability on the hardware we were using.
Josh: To a certain extent, working around AWS flakiness is one problem, but also understanding performance characteristics of Postgres on top of these volumes that you’re talking about is a whole other issue. So we spent time and effort trying to make sure we’re using the resources that Amazon makes available to us in a way that’s most efficient for Postgres. So we depend on EBS optimized instances talking to provisioned IOPS volumes for the main data directory. We have something else for the XLog directories and we can also use instance storage for transient data. It's not as reliable but has less latency than EBS. So all of these factors have to be taken into account.
Keith Alsheimer is Chief Marketing Officer at EnterpriseDB.