Contributed by Douglas Hunley
Before you begin a Postgres deployment, it is crucial to think about your database strategy sooner rather than later. Neglecting to do so can result in misallocated resources, limited ability to change, or worse, critical data loss as a result of an outage or disaster.
By taking the time to think about some key database variables and making sure you’ve answered some key questions, you can reduce risk, spend less time troubleshooting post-production, lower long-term maintenance costs, and instill confidence by using a well-designed deployment plan. All this adds up to you being a superstar DBA.
This two-part blog series addresses the top ten questions every DBA needs to ask and answer before embarking on a Postgres rollout. The first five are addressed here in part 1.
1) Acquiring Hardware – Where should I put my money?
Two of the most significant factors to consider for a database platform are how much memory the system can hold, and how effective of an I/O subsystem you can build. A lot of systems today tout how many cores they have and how fast the CPU is. And while that’s certainly helpful for the database, it shouldn’t be the deciding factor; it doesn’t do you any good to have 24 cores for any one given Postgres query that your application submits. If you’re only running one query, the remaining 23 cores are sitting idle, because Postgres doesn't currently do parallel query. Buying more RAM for the system would be more cost-effective.
2) How should I partition the I/O?
Once you have determined what kind of I/O system you’re going to buy, whether it’s a Fusion-io box or some other solid state drive or whatever the case, you can partition the I/O pattern that Postgres is going to generate by splitting up the various parts of your database onto that I/O. As an aside, this is one of the thingsthat we at EnterpriseDB typically do as part of our Architectural Health Check service.
Once you have the hardware, you want to think about utilizing table spaces in your database. Hybrid storage solutions that have combined SSDs and HDDs work really well, but they don’t “understand” the concept of a database on top of them. It’s actually much easier to use the tablespace and forcibly allocate something full-time to an SSD or full-time to a hard drive. This is also true for remote storage and things like NFS, GlusterFS, FUSE, etc. Essentially, you want to avoid any storage system that doesn’t create a local device node on your system.
3) What does my app want?
It is very easy for DBAs to design a database solution that is optimal from their point of view but is not optimal for the application. So it’s vital to understand what the application wants. How does the application behave? Is it a few long-running transactions crunching lots of data? Or does the application constantly open small connections and fire off quick little queries, then close the connection? Does it do more reads or more writes? Is it an even mix? All of these things should inform your decision as a DBA in regards to how you should tune the system upon deployment.
4) Can your application handle what your database offers?
You can build a database system that’s highly available and redundant, and has transparent failover, but does the application know how to properly recover? There’s actually no such thing as an instant failover. There’s always going to be some period of time, maybe less than a second, where your database is temporarily unavailable. Can your application handle that? Will it need to be restarted, or will it pick right up and continue as if nothing happened? Can the application re-request something from the database if the answer isn’t up to date? Be mindful you don’t design a solution at the database level that offers “amazing” replication features if the application can’t make use of them.
5) High Availability or High Recoverability?
When most people say “availability,” they really mean “recoverability.” However, they are still two distinct things. High Availability (HA) simply means that the application doesn’t go down, or if it does go down, it goes down for as little amount of time as possible (EDB tools likes EDB Failover Manager or xDB Replication Server with multi-master replication can help provide this). With High Recovery systems, it’s not as critical if the application is down for a moment or two. But it is more critical that every transaction that was committed to the database stays committed and is recoverable. In this scenario, you’ll want to think about things like warm standby, WAL archiving, and Point-in-time Recovery (PITR). Depending on what’s needed, some organizations will want to do a combination of the two.
Doug Hunley is a Certified System Engineer and Certified Postgres Trainer at EnterpriseDB.