Developing on many versions of PostgreSQL at once

Craig Ringer

August 19, 2019

Developing on many versions of PostgreSQL at once

In a developer’s ideal world there’s only one stable and one development version of the product. Everyone updates immediately to the next stable version and nobody cares about anything older.

As far as I can tell this has never actually happened, ever, in the history of anything.

Working on many versions

PostgreSQL is a mature and stable codebase. Users often run very old versions and they are usually reluctant to perform major upgrades frequently. Any upgrade involves some risk – anyone who tells you otherwise probably also has a great deal for you on some prime real estate or a bridge. Upgrades also have costs in terms of testing, abort planning, changes in surrounding infrastructure, staff time and more.

So in the real world I routinely need to make changes on the five major versions of PostgreSQL that’re supported by the community plus the two back-versions supported by 2ndQuadrant *and* the 2ndQPostgres versions we use to deliver new functionality to customers who can’t wait for the community development process and official PostgreSQL releases. I have 14 PostgreSQL work trees open right now… and the five supported community releases installed from official packages too.

Back in the days of CVS or Subversion this would test anyone’s sanity to the breaking point. It’s still a challenge and it requires a fair bit of care and organisation, but proper use of git and some surrounding tooling makes it much more manageable. I’d like to share how I do it.

Managed virtual PostgreSQL environments

Anyone who has worked with dynamic interpreted languages like Python recently is probably familiar with the concept of a sandboxed language environment separate from their platform libraries and packages – like Python’s virtualenv, Ruby’s RVM, etc etc.

There’s are a couple of tools for PostgreSQL developers along the same lines:

I should probably use one. I use a simple collection of bash aliases instead, but the principle is similar: I have commands that use the postgres sources in the current working directory to launch a child bash with a suitable PATH, LD_LIBRARY_PATH, modified prompt etc. I have other commands that check if my worktree is clean, fetch the latest sources, clean the worktree, re-run configure then do a parallel make and install. You could do it however you want; use Docker instead of the shell environment or whatever. But you need something like this to stay sane.

This means that at any time I can tell what I’m working with, I can have multiple environments running at the same time, and I can easily interact with all of them at the same time.

ccache

If you don’t use ccache yet, start. I’ve had a couple of issues that turned out to be caused by ccache caching, but the time saved by using it is so huge that it’s an utterly vital part of my workflow.

If you’re ever suspicious of ccache issues you can just ccache -C to nuke your cache. I’ve only ever had this fix a weird problem twice. Ever.

Git worktrees

I frequently need to have many different patches on the go at once and usually need to apply the same patch to many different PostgreSQL versions. This can result in an explosion of branches and checkouts and a lot of chaos. It’s also really slow and disk-space hungry as all those PostgreSQL source and build trees add up fast.

Even worse, when you have multiple git checkouts and need to share things between them you may find yourself doing ad-hoc fetches between them or even using patches to manually swap around changes. You’ve probably also played with reference repositories and discovered their limitations for ongoing work, then tried to use bare worktrees and discovered how fragile they are. You likely get lost trying to figure out which branches were in which checkouts, searching for lost commits, and wondering which stash in which repository contains that bugfix you wrote last week. It can become unmanageable fast.

Integrating git worktrees into a managed postgres environment helps a lot. You effectively maintain *one* local repository for all your work – one .git directory. You never work directly in the default worktree, the one created when you do a plain git clone. For this reason I use a `git clone –bare` as the starting point.

Then for each branch you need to work on, be it a stable version branch, a feature-dev branch, a hotfix branch or whatever else, you add a new worktree to your repositoiry. This isn’t a clone, the worktree shares the same git-directory with a common set of remotes, heads, objects, and the like. Every git command like git log, git commit etc uses the common repository, but uses the current directory’s worktree as the checkout and works on the branch associated with the current worktree.

This is a lifesaver. Need to deliver a quick hotfix on top of 9.6.15 but you’re in the middle of a big patch you’re half-way through a git add -p on? Working on a git bisect to track down an issue? You don’t have to git stash your work in progress, reset your bisect or merge, or anything else. Instead you

git worktree add ../pg96-ticket42222-hotfix REL9_6_15
cd ../pg96-ticket42222-hotfix
git checkout -b pg96-ticket42222-hotfix

You now have a worktree named pg96-ticket42222-hotfix with the corresponding branch pg96-ticket42222-hotfix forked off the tag REL9_6_15 and checked on it it. You can reset it, rebase it, bisect it, commit to it, whatever, and it won’t affect your other worktrees and their branches.

Best of all your visibility into the state of all your in-flight work is greatly improved. You can diff, merge and cherry-pick between work on different worktrees or rebase one onto another. You can search the git logs across all of them. You can list them with git worktree list.

A git fetch or git pull makes new branches, tags, commits visible on all the worktrees that share the same repository so you don’t have to fetch upstream changes repeatedly on different repos or mess around with using explicit reference repositories.

The branches of different worktrees can have different upstream branches on different remotes, so you can coordinate work between multiple remote repositories. They don’t even have to be for the same product or share any common history, though it’s not often you’ll want to mix (say) branches from PgJDBC and branches from PostgreSQL in one repo.

If you rm -r the worktree nothing cares very much, though you should preferably git worktree remove instead, and you’ll want to git worktree prune at some point if you forgot.

I adopted a uniform scheme for my install paths and some wrapper commands. So I know that if I’m in worktree $PGDEVSRC/ticket4222 then the corresponding install will be in $PGDEVBIN/ticket4222. To refresh I just run pgrebuild My shell prompt warns me if my CWD is a worktree that doesn’t match the directory of the first psql on the current PATH to help prevent mistakes too.

(I really should teach pg_virtualenv about SCM and worktrees).

Worktree setup

Here’s roughly how I set things up.

# Collect up everything:
git clone --bare https://git.postgresql.org/git/postgresql.git postgresql-reference
cd postgresql-reference
git remote add myCompanyPostgresRepo1 https://some/url/
git remote add myCompanyPostgresRepo2 https://some/other/url/
git remote add myCompanyCIServerPostgresRepo https://some.ci.server/git/
git fetch -a

# add base worktrees for community stable branches
for branch in REL9_4_STABLE REL9_5_STABLE REL9_6_STABLE REL10_STABLE REL11_STABLE master
do
    git worktree add ../pg-community-${branch} ${branch}
done

# add worktrees for company git heads. Git figures out that the names refer to a
# different remote unless the names clash in which case you have to qualify the
# name with the remote.
#
# My shell prompt complains when these worktrees are dirty, as I should only
# ever work in temp worktrees.
#
for branch in SOME_PG_PATCHES OTHER_PG_PATCHES
do
    git worktree add ../pg-company-${branch} ${branch}
done

# Start work on a patch - I have shell alias helpers for this
cd ../pg-community-master
git branch --track community-master-FEATUREFOO master
git worktree add ../pg-community-master-FEATUREFOO community-master-FEATUREFOO
cd ../pg-community-master-FEATUREFOO

Given the number of people developing on PostgreSQL it might make sense to start sharing some tooling around this stuff. But for now hopefully my description of my workflow was useful.

In this Article

Working on many versions

Resource Feature Callout 1

Developing on many versions of PostgreSQL at once

Craig Ringer

Working on many versions

Managed virtual PostgreSQL environments

ccache

Git worktrees

Worktree setup

More Blogs

Cloud Neutral Postgres Databases with Kubernetes and CloudNativePG

Scaling Breakthrough: EPAS 17 Scales Twice as Well, Thanks PostgreSQL!

How CloudNativePG Manages Physical Replication Slots for PostgreSQL in Kubernetes