May 25, 2016
There is an important difference between the open source PostgreSQL community and many other open source communities – it is not controlled by a single company that can dictate whether others can contribute. Instead, the PostgreSQL open source community is independent of any single company. It’s not uncommon for companies to claim to support open source, but really only to the extent of allowing outsiders to make contributions to software originally created by the company, and then only if the company approves of the contribution.
It’s important to make these distinctions in order to understand the relationship between EnterpriseDB® (EDB™) and the open source PostgreSQL community. PostgreSQL is a truly independent community, but EDB strongly supports the work of the PostgreSQL community. Many EDB staff members devote substantial time to working on PostgreSQL community initiatives or on related projects like pgAdmin and pgpool-II, and have done so for years. This means that during work hours, staff write and review code that is committed to the core PostgreSQL project, with the blessing of EDB management.
In this blog post I'd like to outline some of our recent contributions to the PostgreSQL ecosystem. Like any other company that supports PostgreSQL, EDB can, and does, make technical contributions to the PostgreSQL community in a variety of ways: code; patch review, testing, and commit; external tools; and packaging. (Of course, in addition to these technical areas, EDB can and does support the community financially, through event sponsorships, for example, but that is a topic for another blog post.)
PostgreSQL 9.6 Contributions
In the PostgreSQL 9.6 development cycle, EDB made several large contributions of code. Parallel query was developed primarily by EDB staff: Noah Misch helped with early design work while he was at EDB; Amit Kapila did most of the work on parallel sequential scan; and much of the infrastructure – as well as the support for parallel joins – was written by me. Other EDB staff also contributed code, testing, and bug fixes. In addition, contributions from other members of the PostgreSQL community – such as the important work David Rowley of 2ndQuadrant did on parallel aggregate – were reviewed and committed by EDB staff.
EDB staff was the primary developer of two other very significant features to PostgreSQL 9.6. Thomas Munro developed support for a new consistency level, synchronous_commit=remote_apply. Together with the new support for multiple synchronous standbys in PostgreSQL 9.6, which was developed by Sawada Masahiko of The Nippon Telegraph and Telephone Corporation (NTT) and various others, it makes it possible, for the first time, to build robust read-scaling clusters with PostgreSQL. In prior releases, it was possible to guarantee that a commit acknowledged to the client had been replicated synchronously to a single standby, that guaranteed only that the Write-Ahead Log (WAL) for that transaction was on the stand-by's disk, not that it had actually been replayed and made visible.
Separately, Kevin Grittner of EDB developed support for a new "snapshot too old" feature, the first PostgreSQL feature which makes it possible to control table and index bloat without needing to kill entire database sessions. Instead, only queries that attempt to access already-removed data are affected, and it is only the query, not the entire session, which is terminated.
Collaborative Development Efforts
In addition to the features written primarily by staff, EDB also collaborated with other PostgreSQL companies on a number of major development efforts. Performance and scalability continue to be priorities within the PostgreSQL community, and contributors from several companies, including Citus Data, Postgres Pro, and EDB were active in that area during the PostgreSQL 9.6 development cycle.
- Andres Frend (of Citus Data) and Alexander Korotkov (of Postgres Pro) contributed a patch to allow pinning and unpinning of buffers to happen using atomic operations rather than spinlocks.
- Dilip Kumar (of EDB) contributed a patch to improve the performance of concurrent insertion into rapidly growing tables.
- Amit Kapila (of EDB) and I contributed a patch to improve scalability on write-heavy workloads by ameliorating contention over a key lock called ProcArrayLock.
- Alexsander Alekseev (of Postgres Pro) contributed a patch to improve the scaling of shared-memory hash tables used by PostgreSQL for various purposes.
- Peter Geoghegan (of Heroku) contributed a patch to significantly speed up large sorts.
- Amit Kapila and Andres Freund worked together to determine that an increase in the number of buffers used to store commit-status data could improve scalability without regressing performance in other cases.
EDB supported every one of these patches, not just the ones we wrote, with review, testing, and benchmarking. EDB believes that a faster PostgreSQL is good for everyone, and we want to collaborate with other companies – and individual contributors - who share that goal.
EDB has been working with NTT for many years to enhance PostgreSQL and Postgres-XC, and specifically in areas related to horizontal scalability. However, in this release cycle, EDB has significantly increased the amount of time spent by our staff in working with NTT to improve postgres_fdw. Ashutosh Bapat and Rushabh Lathia both spent significant time on this project. Postgres_fdw now has the ability to push sorts, joins, UPDATEs, and DELETEs entirely to the remote server, dramatically improving performance on many queries. Many important capabilities remain to be added, and EDB is committed to working with NTT and other interested parties in the PostgreSQL community to bring those capabilities to PostgreSQL. Because this is such an important area for PostgreSQL, we also expect that other companies may make completely independent contributions in the area of horizontal scalability, and we're very happy to see that happen.
EDB has participated in other projects as well. For example, during the PostgreSQL 9.6 development cycle, major work in allowing PostgreSQL to scale to larger database sizes than ever before was led by Masahiko Sawada of NTT, with some help from me. This work optimizes vacuums that are performed "for wraparound", so that the system does not need to repeatedly scan the same data blocks. And, PostgreSQL 9.6 now offers much better reporting of waits in pg_stat_activity, an effort in which several EnterpriseDB staff participated along with various others, including Ildus Kurbangaliev and Alexander Korotkov (both of Postgres Pro) and Andres Freund.
This isn't an exhaustive list. But it underscores the point that EDB’s participation in the PostgreSQL development process is deep and wide, involving many members of the company’s staff not only in the writing of code but also in reviewing, testing, benchmarking, and committing.
Related PostgreSQL Projects
Aside from our work on the core of PostgreSQL, EDB makes significant contributions to several other PostgreSQL-related projects. EDB employs two pgAdmin committers, Dave Page and Ashesh Vashi, and has funded most of the development on pgAdmin 4, a new Python-based web UI with a clean, modern look. EDB also employs Muhammad Usama, a pgpool-II contributor and committer.
EDB has also released several tools as open source. Recent examples include Foreign Data Wrappers (FDWs) such as hdfs_fdw, mysql_fdw, and mongo_fdw, which are maintained by EDB staff member Ibrar Ahmed. The hdfs_fdw was originally written by Ibrar, while mysql_fdw was originally written by Dave Page (also EDB), and mongo_fdw was written by Citus Data and has been enhanced and extended by EDB staff. In addition, EDB recently released a catalog sanity checker called pg_catcheck, which helps to find catalog corruption in PostgreSQL databases with the goal of repairing them to a degree where undamaged data can be extracted via pg_dump.
EDB was founded to help organizations be successful with PostgreSQL and developed the enterprise-class performance, security, and manageability enhancements many required in order to deploy open source. Clearly, EDB has derived a tremendous amount of benefit over the years from its association with the PostgreSQL community, but we believe that the PostgreSQL community also benefits from our participation in the community and the features that EDB contributes. By continuing to enhance and improve PostgreSQL and support related tools and projects, EDB and EDB staff who work to contribute code to PostgreSQL hope to make PostgreSQL's long-time slogan, “the world's most advanced open source database,” even more true, for the benefit of our customers and the whole PostgreSQL community.
Robert Haas is Vice President and Chief Database Architect at EDB.