Josh Berkus

Job Title: 
PostgreSQL Core Team Member
Release Date: 
Mon, 2011-03-21 14:35
Description: 

Josh Berkus discusses working in the context of the international PostgreSQL community, perspectives on Sun's acquisition of MySQL, PostgreSQL's 8.4 release, and the influx of SaaS-based applications.

 

Bob Zurek: Welcome to Database Radio, a series of podcasts from EnterpriseDB featuring business and technology experts who will share their views and perspectives on a variety of topics and solutions related to the database tools and open source market. Hi, I'm Bob Zurek, CTO of EnterpriseDB and host of Database Radio, and today it's a real pleasure to welcome Josh Berkus, who is one of the core team members of the world-spanning open source database project PostgreSQL. He's quite a database visionary, having been well-versed in the mechanics of database technology, especially in open source, and has been involved in lots of open source projects going back to 1998. It's a real pleasure to have you on Database Radio today.

Josh Berkus: Thanks a lot for setting this up! I always enjoy an opportunity to talk, as anyone who knows me can attest.

BZ: I think it would be important for our audience to have you give us a little perspective on how you got involved with the project and what it's like to be working in the context of this great PostgreSQL community.

JB: Back in 1998, I was actually a SQL Server consultant and SQL query and reporting expert. I had gotten really frustrated with the unreliability of SQL Server and the underlying Windows platform, and was actually looking for something better. I found Linux to be more reliable than Windows, and, on the basis of having tried Linux and liked it as a server platform, I started looking for an open source database. As a SQL expert, MySQL was not suitable for my needs. Then I found PostgreSQL, which was actually very fascinating to me because it was an attempt by a bunch of database experts to put together what we regarded as the ideal SQL database.

BZ: What are a few things that make PostgreSQL a real standout?

JB: It's really a commitment to excellence, to real thoroughness, both in the amount of syntax and features that we implement, in implementing them correctly in a standards-compliant way, in making sure stuff is bug-free and is secure, and having a real attitude that bugs and issues and stupid work-arounds are not things to be casually tolerated but things to be fixed.

BZ: Some of the buzz in the industry right now is a lot of discussion around the latest release of MySQL 5.1 and one of the co-inventors claiming it has a lot of bugs ...

JB: It's not a release, honestly. It's the same beta that they have been periodically updating for the last year and a half, and they just decided to arbitrarily call it generally available because it suits Sun's business purposes to make this month the "general availability" month. But it's no higher quality than the last beta was, and it's certainly nothing that we would even consider an alpha in the PostgreSQL world.

BZ: You were working at Sun, right, when the MySQL acquisition happened? You were known as the PostgreSQL guru inside Sun, and had strong visionary leadership there. What was that like?

JB: A lot of people make the assumption that I left Sun because of MySQL, and that's not actually true. As a matter of fact, the acquisition of MySQL improved a number of things for PostgreSQL at Sun. It just didn't improve enough of the long-standing issues, primarily a complete lack of resource commitment. Sun has a real history of announcing initiatives and then only funding them halfway, and, as any business person should know, if you only fund something halfway, you've more or less wasted your money because you've funded it enough for it to fail. My time at Sun was actually spent trying to increase that resource commitment to a level where a serious Sun-PostgreSQL initiative would have been viable and would have been a real challenge to take a bigger chunk out of the Oracle and DB2 market. But we never actually reached that level of commitment. After the acquisition of MySQL, Sun's stock went into a death spiral, and it became clear to me that Sun would never have the resources to back a PostgreSQL initiative properly. Actually, I've known a lot of the MySQL people for years, and have worked with them and found them very interesting to work with, and was actually very engaged with them in trying to fix some of MySQL's chronic organizational issues, particularly around bug fixing. But I wasn't really interested in staying at Sun in order to work on MySQL; if I had wanted to work on MySQL, there had been earlier opportunities to do so. It was a very interesting rollercoaster ride, though, because the acquisition was done very quickly, which meant that it startled a lot of people, including a lot of MySQL and Sun employees. Getting everything integrated was a real challenge, and that was interesting to be part of. One of the things that I think is a shame having left is that I've lost a lot of my contact to a lot of MySQL people, who are actually great people and very interesting to hang out with.

BZ: One of the things I'd like to explore is what you're excited about regarding the 8.4 release of PostgreSQL.

JB: There's actually a lot of stuff to be excited about with the 8.4 release. One of the funny things about PostgreSQL is that, after release, people start asking me, "Okay, what's going to be in the next version?" When I'm on the talk circuit, I tell people about some of the things I know people are working on. But what I tell people about at the beginning of the development cycle is almost never what we actually end up with in the new version because some features turn out to be much harder than they look, and developers lose their interest in some things, and other things suddenly get funded in the middle of the year, allowing people to work on them full time to actually push them through. 8.4 was definitely a case of that. I'll mention that one of the choices that every project has to make is going to be time, quality, and specification. What we've chosen with PostgreSQL is to focus on time and quality; that is, we're going to do a release every year, and that release is going to be very high quality. But what that means is that features that we were really interested in often get cut because of quality, or because of not being ready on time, and other features that were ready earlier than expected end up being in the database. That's in contrast to MySQL, which has chosen to pursue a specification-based approach, where they have a list of target features and they take as long as they need to take in order to meet that list of target features, which has also unfortunately resulted in some quality problems. For 8.4, one of the most exciting things is that it looks distinctly possible that the log-based replication that we had discussed as an overall target project for version 8.5 might actually be available in 8.4. This includes both asynchronous and synchronous log-based simple replication, which is really what a lot of users want, for just a simple fail-over setup that is relatively bulletproof. Given the number of people out there who have been using MySQL's simple replication for rudimentary horizontal scalability, I'm really excited about what this will hopefully lead to for people using PostgreSQL in the same way. A second feature that's really interesting is common table expressions and recursive queries, which are very linked features. We're not the first people to have this . Oracle and DB2 have this ahead of us . but it's part of the SQL stack, and you can have a query that calls another query as a reference in it. The most common reason to use that is to do what are known as recursive queries, where you say, "From parent node x, I want every child that has these certain characteristics." Recently on the hackers' list, somebody showed that you could make a much more interesting use of this, where they showed that you could use the recursive queries to generate a fractalgraph in a single SQL query, which is a lot of fun as a concept and cool as a demo. Those are two off the top of my head.

BZ: Beyond 8.4, where do you see PostgreSQL going over the next ten years?

JB: You have to look at where the database market is going in general. What you're really seeing after about a decade of consolidation is that the database market is moving back into diversification where, increasingly, as companies now have the technical resources, it is now feasible for them to run multiple database platforms within their enterprise. And there is a big incentive in running specialty databases when their use is called for, or databases that have different characteristics. For example, they may run PostgreSQL for their main financial server, they may run SQL*Lite for a desktop application, or they may run PostGIS for a geographical application. Every decision you make in database design is a trade-off. Anything you decide to do forces you to not do something else. PostgreSQL has a very complex query parser and planner; as a result, we are able to handle large, complicated queries and execute them very quickly and accurately. MySQL, on the other hand, has a very simple query parser and planner, and as a result can't handle those large, complicated queries, but is able to execute very simple queries with slightly less overhead than PostgreSQL. You have to make those sorts of trade-offs, and, if a company has a particular application it needs to run, there may be a most appropriate database for that application, and that database might not even be a SQL relational database. It might be an object database, or a scalable big table database, or a document database. And I think we're going to see increasing diversification in this realm. What are people going to want PostgreSQL for? PostgreSQL is really moving in the direction of being not just a database, but an entire development platform. It's now actually technically possible to actually put all of the layers of your application except presentation inside the database with PostgreSQL and gain certain functionality by doing so. It's not an unreasonable perspective; Oracle is increasingly moving into the middleware stack for the same reasons. If you have this complex, high-end database that is capable of sustaining complex functionality and complex business logic, it becomes increasingly attractive, both to you and to your users, to take over a larger portion of the application stack with that database platform. And I really see that as where PostgreSQL is going. Because we are the customizable database, we are moving into increasingly more exotic areas of application functionality than Oracle is. In terms of what's going to happen with the traditional SQL database, it's a little hard to tell. Oracle is moving to cement their current market share by taking over a lot of business applications and owning them, thus preventing people from moving off of Oracle. Whether or not they'll be successful depends on how well they manage those business applications.

BZ: There are a lot of great new SaaS-based applications coming around the corner, somewhat disruptive applications to the legacy Oracles, Siebels, and Peoplesofts.

JB: Yes, and obviously a lot of the SaaS applications are increasingly focusing on using open source technologies. For them, the cost of the software technology and a lack of control are really important. Certainly you could use Oracle or Microsoft or IBM to power a SaaS application, but the drawback to that is that sooner or later those companies are going to end up competing with you, and then all of a sudden you're in the bad situation that SAP is in, for example, where your chief competitor is also your inalienable technology provider and has the ability to mess with your ability to deliver new products. Working at Sun, I saw that going on. For SaaS providers, using an open source database for the database component of their technology stack makes a lot of sense, particularly if they're using some of these cloud deployment platforms. It's very hard to use proprietary licensed software because you lose your flexibility of deployment; you can't just add six new servers to your cluster without paying for additional licensing costs, in addition to the increased hosting costs. People have seen what's happened with a lot of the application development companies of the '90s, and how they were cannibalized by Oracle and Microsoft, and want to avoid giving a vendor that much control over them.

BZ: Thank you very much for sharing your time today, and we want to wish you all the best. If anyone is interested in contacting you, is there any place they should go to get a hold of you?

JB: For now, just email me at josh@postgresql.org. I have a blog up on IT Toolbox, but it hasn't been updated recently, but you can certainly find me through that.

BZ: This is Bob Zurek for Database Radio. Thanks again for joining us today, and we wish you all the best.