When PostgreSQL 9.0 shipped a few months ago, it included several new replication features. It’s obvious that you can use these features to build clusters of servers for both high availability and read query scaling purposes. What hasn’t been so obvious is how to manage that cluster easily. Getting a number of nodes installed and synchronized with their master isn’t that difficult. But while the basic functions necessary to monitor multiple nodes and help make decisions like “which node do I promote if the master fails?” were included in 9.0, the way they expose this information is based on internal server units. There are a few common complaints that always seem to show up once you actually consider putting one of these clusters into a production environment:
- How do I handle adding new nodes easily to expand capacity later?
- Can I monitor replication lag in time units?
- When the master fails, how do I find the right node to promote, then switch all of the other standby servers to follow it?
Solving these problems inside the database itself isn’t necessarily the right way to proceed. PostgreSQL has a clear distinction that some things belong on the core server code, while others don’t. Where that really starts to break down here is the node fail-over case. Handling that properly requires coordinating actions on multiple nodes at once. If you look at this problem long enough, you’ll realize that what works best here is a background daemon that aids in monitoring and node state changes. It can stay in the background all of the time, and respond to requests from other nodes to coordinate multi-node actions. Since by definition part of that daemon’s job will require operating when
there is no working database on the server yet, the idea of integrating that job into the PostgreSQL core is difficult to
achieve. Also, people deploying databases tend to be risk-adverse about the database code itself, preferring to deploy older, tested releases rather than the latest one. Given the about yearly release rate of the core PostgreSQL code, an external program can evolve much more quickly. Since such a program would be operating using the standard user APIs to the database, changes to it shouldn’t put your data at risk the way touching the core server code does.
With all this in mind, before PostgreSQL 9.0 was even released 2ndQuadrant started an internal project to handle all of these tasks, which we’ve now released as a program named repmgr. It provides simpler user interfaces to basic
setup of a multi-node cluster. The data needed to monitoring lag in business appropriate ways is recorded. And even complicated node transitions can be handled all from one system, with its always running repmgrd daemon process handling communication to the others.
repmgr has been in testing internally and at customer beta sites for months within 2ndQuadrant, and the first external release of the code came out in December. We’ve just been waiting for some early broader testing before the sort of promotion you’re reading right now. That’s all been going well so far, and work is moving toward a 1.01 release later this month that clears up the main issues found by our early adopters. You can find documentation inside the source code repository active development is happening in, which is currently my GitHub
repository at http://github.com/2ndQuadrant/repmgr
We also have a Google Groups area you can use as a support forum or a mailing list, as you prefer, for discussing the software. In addition to it being external code, the other controversial aspect of the repmgr release has been that
it’s licensed under the GPL v3, rather than the BSD license used for PostgreSQL itself. We’ve gotten criticism that we’re trying to emulate the mixed commercial/open license scheme seen in other databases such as MySQL, a model reviled by much of the PostgreSQL community. This is completely backwards from the reality.
The terms under which repmgr were developed required that we release it as free software, and that it remains such. There is no special proprietary version we charge for. The code that’s shared on GitHub is a snapshot of our whole
development repo going back to the first commit, warts and all; the only thing we’re not doing is releasing some internal development branches until they work. There are no commercial restrictions on using the program. The only restriction being enforced by the use of the GPL here is that we expect code changes made to the program to be
shared with the world. We want this to be free software in the spirit that phrase is used by the Free Software Foundation: if you find the program useful, and decide to enhance it, you should share those enhancements with the world. In fact, the way we are handling copyright issues around the code was modeled carefully on the FSF
requirements for submission to the GNU software chain. The main purpose of the copyright assignment we’ve asked contributors to do is not so we can have our own special private build. It’s to make sure that someone else hasn’t added non-free software to our project. We don’t want contributions we merge to end up limiting the ability of others to rely upon repmgr for their projects, by making it less frees software for having accepted it.
repmgr is being actively developed by many members of 2ndQuadrant, and represents a major community project
we intend to keep advancing. It seems appropriate that a project that is already finding itself being put to use replicating databases across national boundaries was developed that way, too. The initial design concepts and architecture came from Simon Riggs in the UK. The bulk of the coding so far was done by Jaime Casanova in Ecuador. Myself and Robert J. Noles here in the US did the initial documentation and testing. Our second user of repmgr for a production customer, Gabriele Bartolini in Italy, has been sending back a steady stream of bug fixes and feature improvements due to arrive in the next release. If you work with PostgreSQL, you should recognize some of the names on that list. Ask yourself which sounds more likely: that all of us who have staked our careers on the
success of a free PostgreSQL have simultaneously turned away from that philosophy because of our devious corporate interests; or that we’re releasing a free software tool we intend to build an open community around.
You can get repmgr in the morning and be building multi-node clusters with it by the end of the day. We hope you do just that, and consider joining the user and development community we’re building around it. While not all may agree with every decision about repmgr we’ve made, don’t forget the most important thing: everyone who successfully deploys replicated PostgreSQL is another person we’ve helped save from wasting money on
Oracle RAC. And isn’t that what’s really important?