Docker Hub carries Docker images for PostgreSQL, based on Debian Stretch or Alpine Linux. These are not official PostgreSQL Development Group (PGDG) images from postgresql.org, they’re maintained in the Docker Library on Github. But as Docker adoption grows these are going to become more and more people’s first exposure to PostgreSQL.
I tried these images out as I needed some convenient and simple Docker images for PostgreSQL in a continuous integration job. It wasn’t entirely fuss-free, but overall the images worked well. I’d like to share some tips.
Ephemeral by default
WARNING: by default these images make it very easy to lose your data forever. Personally I think that’s a significant defect and they should refuse to run without either a mounted volume for the data or a env-var passed to “docker run”. It’s discussed in “Caveats” in the docs provided, but not IMO nearly prominently enough.
If you want persistent storage you really need to supply a Docker volume to use when you first run the new container:
docker run -v my_pgdata:/var/lib/postgresql/data postgres:11
Docker will auto-create the volume my_pgdata
if it doesn’t exist. Or you can use a host-path to use a bind mount; see the postgres docker image documentation under “Where to Store Data”.
The Docker postgres
images, by default, create a PostgreSQL instance under /var/lib/postgresql/data
, which is in the container’s private storage unless mapped to a volume. When the container exits you lose access to the container storage. The data directory is not immediately deleted, but will be lost forever if you clean up exited docker containers with docker rm
, docker system prune
, etc.
To recover access to the container’s data once the container exits but before such permananent removal has occurred, find the container id with docker ps -a
. You might find docker ps -f ancestor=postgres:11 -a
useful. Once you have the container ID you can docker start
the container to start it again, perhaps to pg_dump
its data. It’ll start with any port bindings preserved. Alternately, you can docker cp
the data directory in /var/lib/postgresql/data
out of the stopped container so you can copy it into a persistent Docker volume and attach that to a new instance of the container.
Data safety
WARNING: Docker’s default storage configuration on most installs may perform poorly and isn’t necessarily crash-safe. The Docker documentation says as much..
If you intend to use anything like this in production make sure you use on of the production-supported Docker storage backends.
This is particularly important for PostgreSQL because of issues relating to how Linux handles failure of flush (fsync) requests to full thin-provisioned volumes. Avoid putting a PostgreSQL data directory on dm-thin / lvmthin backed storage. You can check your storage backend with:
docker info|grep -i '^Storage Driver:'
Locale, encoding and collation
The Debian based images default to the en_US.UTF-8 locale. If you want a different locale see the documentation for the 3 images under “Initialization scripts”.
Root by default
Docker likes to run everything as root
and the postgres
images are no exception. However, it’ll switch to the internal postgres
user with id 22 to run postgres
when run as root.
If you try to run with a different --user
with a Docker-managed volume you will usually get an error like:
fixing permissions on existing directory /var/lib/postgresql/data ... initdb: could not change permissions of directory "/var/lib/postgresql/data": Operation not permitted
and will need to use the workaround given in the documentation under “arbitrary users”.
This issue won’t arise with a volume bind-mounted into the host workspace, so long as the user has the appropriate datadir permissions, e.g.:
cd $HOME
mkdir pgdata
chmod 0700 pgdata
docker run --user "$(id -u)":"$(id -g)" -v "$(pwd)/pgdata":/var/lib/postgresql/data postgres:11
Connections between containers
The docs cover how to connect to the container from the Docker host well enough, using published ports. But often you’ll want to use the postgres
container from another container and that’s rather less simple. Especially as the container lacks a convenient way to set up pg_hba.conf
.
To do inter-container connections you can create a user-defined network for your containers and connect between them by container-name. It’s not necessary to override the default pg_hba.conf
since it defaults to wide open. For example, to start the server on the default bridge
network with an IP arbitrarily assigned by Docker with a datadir bind-mounted into the container from the workdir:
docker network create pgdemo
mkdir resources pgdata
chmod 0700 resources pgdata
echo 'secretpassword' > resources/pgpassword
# launch postgres container to run in the background
#
# note that we do NOT need to "--publish" any ports here
#
PGCONTAINER_ID=$(docker run \
--network bridge \
--name postgres \
--volume $(pwd)/pgdata:/var/lib/postgresql/data:rw \
--volume $(pwd)/resources:/resources:ro \
--env POSTGRES_INITDB_ARGS="--auth-host=md5 --auth-local=peer" \
--env POSTGRES_USER="postgres" \
--env POSTGRES_PASSWORD_FILE="/resources/pgpassword" \
--user $(id -u):$(id -g) \
--detach \
postgres:11 \
)
# It started up ok?
docker logs $PGCONTAINER_ID
# there is no port-mapping to the host configured here
docker port postgres
Then with the bridge
network you can get the postgres container’s IP address and connect to it from your other container. Or use the --add-host
option to docker run
to bind it to a hostname in Docker’s managed DNS like in the following example:
# How to re-find the container-id later:
PGCONTAINER_ID=$(docker ps --filter name=postgres -q)
# Container address for mapping
pgcontainer_ip=$(docker inspect -f {{.NetworkSettings.IPAddress}} $PGCONTAINER_ID)
echo "postgres is running on $pgcontainer_ip on default docker bridge"
# Make a simple image to test the container link
mkdir pgtest
cat > pgtest/Dockerfile <<'__END__'
FROM debian:stretch
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get -y -qq update && apt-get -y -qq install postgresql-client
CMD psql "host=postgres user=postgres password=secretpassword" -c "SELECT 'successfully queried postgres container';"
__END__
docker build -t pgtest:latest pgtest
# Start the container with the setup script. Runs as root for simplicity;
# If you just use --user then libpq will fail because it doesn't like
# running as a user that doesn't exist inside the container.
docker run \
--name throwaway \
--network bridge \
--add-host postgres:${pgcontainer_ip} \
--volume $(pwd)/test.sh:/test.sh:ro \
pgtest:latest
This should emit:
?column?
-----------------------------------------
successfully queried postgres container
(1 row)
Later you can docker stop postgres
. Or you can docker kill postgres; docker rm postgres
and then re-docker run
a new copy of the container with the same data directory later. Either is fine, though changing the parameters you pass to docker run
on subsequent runs with the same data volume will have no effect.
Don’t forget to docker rm
the container(s) you created for testing then docker volume rm
the volume(s) you created for testing if you don’t want them anymore, e.g.
docker kill postgres pgtest
docker rm postgres pgtest
docker docker rmi pgtest
Container link with user defined network
Instead of using --network default
you can docker network create mynetwork
and use --network mynetwork
with both containers.
If you do so you don’t have to look up the postgres container’s ip address manually and --add-host
it. Docker will automatically resolve container names and container IDs with its managed DNS when using a user defined network.
Troubleshooting
"no route to host"
: make sure the postgres container is still running. Check your iptables rules."local user with ID 1001 does not exist"
:libpq
needs the current user-id to correspond to a properly defined user name, i.e.id -n
must report a valid result. Verify thatid -un
reports a username not an error likeid: cannot find name for user ID 1001
.
Basic configuration only
The container supports limited setup via a few env vars like --env POSTGRES_PASSWORD=foobar
at docker run
time, but not much more than that. The entrypoint script adds a wildcard pg_hba.conf
entry for all users and DBs to accept connections from all addresses. This is set to trust
(for all users) if no POSTGRES_PASSWORD
is set, otherwise it’s set to md5
.
WARNING: This means that if you don’t specify a POSTGRES_PASSWORD
, setting a password for the postgres
user later will have no effect. Similarly, passwords will be ignored for any other users you create. You really should specify POSTGRES_PASSWORD
at docker run
time.
WARNING: If you use POSTGRES_PASSWORD
the password will be visible in docker inspect
output for the container, so consider using a temporary one and changing it once the container is launched. Or use POSTGRES_PASSWORD_FILE
to point to a file you bind-mount into the container or add with a derived image.
It’d be really nice if the entrypoint script provided a way to supply configuration files to append/replace in a simple manner. For example it should probably enable directory includes for postgresql.conf
so you could just drop your own snippets in there. It should ideally support generating a simple pg_hba.conf
or for more complex needs copying one from a path specified as an env-var, probably pointing to a read-only host bind mount. It could copy SSL files etc in a similar manner. But for now, you’ll need a setup hook script and/or derived container as discussed below.
Configuration hook scripts
If you want to provide your own non-trivial pg_hba.conf
, append some entries to postgresql.conf
, add a pg_ident.conf
, drop in SSL certificates and keys, etc, there’s a hook for that. Provide a setup script that the existing entrypoint can run after initdb
and after postgres
has been started. To avoid creating a derived container you can read-only bind-mount the script directory into the container. The same approach works for things like SSL certificate files.
For example I have a docker-entrypoint-initdb.d/001-setup.sh
script. I bind-mount this directory into /docker-entrypoint-initdb.d/
in the container at docker run
time, and I also bind-mount a directory containing SSL certificates the script needs to copy into the data directory. When the container first starts the script copies everything into place.
I chose to put my pg_hba.conf
and postgresql.conf
changes inline in the setup script, but you could just as well ADD
them to the container and copy or append them from your script. Here’s an abbreviated version of mine, which relies on the SSL cert, key and root cert being bundled in the docker directory used by docker build
:
#!/bin/sh
#
# First run setup script, run after initdb + pg start by Docker
# entrypoint script.
#
set -e -u
if [ $(id -un) != "postgres" ]; then
echo "Expected to run as user 'postgres' but got user id '$(id -un)' instead"
exit 1
fi
# replace pg_hba.conf entirely.
cat > "$PGDATA/pg_hba.conf" <<'__END__'
# note that the container runs on a private network so we don't do
# address filtering here. If you wanted you could detect the subnet
# and add it instead, or pass a --env to Docker to substitute, but
# it's pretty pointless to do so.
#
#
# TYPE DATABASE USER ADDRESS METHOD
local all all peer
host all all 0.0.0.0/0 md5
host all all ::0/0 md5
__END__
# Append to postgresql.conf
cat >> "$PGDATA/postgresql.conf" <<'__END__'
# For logical rep protocol tests
wal_level = 'logical'
max_replication_slots = 8
max_wal_senders = 8
ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
ssl_ca_file = 'root.crt'
__END__
# create regression test user
psql -c "create user test with password 'test'"
psql -c "create database test with owner 'test';"
# install SSL certs
for f in root.crt server.crt server.key; do
cp "/docker-entrypoint-initdb-resources/$f" "${PGDATA}"
chmod 0600 "${PGDATA}/$f"
chown postgres:postgres "${PGDATA}/$f"
done
I can then invoke this without needing to docker build
any derived image using something like:
docker run \
--name=my_postgres \
--env POSTGRES_PASSWORD=mysecretpassword \
--volume $(pwd)/docker-entrypoint-initdb-resources:/docker-entrypoint-initdb-resources:ro \
--volume my_postgres_data:/var/lib/postgresql/data:rw
… where the current directory contains:
docker-entrypoint-initdb.d/
001-setup.sh
docker-entrypoint-initdb-resources/
root.crt
server.crt
server.key
You could use a similar model for copying over a new pg_hba.conf
etc on first run.
Note that after initial docker run
these files have no effect, and only the ones copied to the Docker volume my_postgres_data
(mounted as /var/lib/postgresql/data
within the container) matter.
Use a consistent –user
If your runtime changes the uid you use when you run the container when you run it multiple times using the same volume, the postgres docker images may get upset and confused at you. The entrypoint does not do any permissions fixups. Use a consistent uid or just live with Docker’s bizarre indifference to using root for everything. The container will still switch to an unprivileged user to run postgres
.
Beware of libc collations when moving data directories around
This isn’t a Docker-specific issue but becomes more prominent with the portability of Docker containers.
Don’t try to run a data directory created with the alpine-based images on a Debian based image. Similarly, avoid mounting and running a volume used in these Debian based containers in some other OS or other Debian version. That’s because the GNU C library occasionally updates the collations (sorting order) definitions for some languages. When collation changes are made the on-disk structure of PostgreSQL’s indexes no longer matches the ordering produced at runtime by the C library. PostgreSQL may stop looking for data prematurely or experience a variety of other surprising behaviours relating to index searches and maintenance.
Stick to one container.