Using a Virtual Environment with pl/python3 in PostgreSQL

May 30, 2022

It's common in the Python world to use a virtual environment for each project you work on. A virtual environment is used to avoid "pollution" of the system installation of Python; it typically contains links to or copies of the Python interpreter, links to or copies of required core libraries, scripts to activate the environment, and most importantly, the packages the project requires.

When working with a virtual environment, it's activated using a command such as source /path/to/venv/bin/activate, which modifies the system and Python paths so that commands like python and pip from the virtual environment's bin directory are used, and packages are installed into and run from the virtual environment rather than the system environment.

PostgreSQL includes the pl/python3 procedural language which allows you to write powerful stored procedures and functions in Python, making use of the huge range of libraries available to the language from within your database. I wrote a blog a while back when I was doing some work with Tensorflow which describes how to get started with pl/python3.

The problem with this is that PostgreSQL will use the system Python environment by default. That might be fine if you're happy to use Python modules available as RPM or DEB packages on your favourite Linux distribution, but what if you're on Windows or macOS where there isn't a package manager that provides Python modules, or you prefer to avoid installing modules in the system Python installation?

In that case, there are a couple of options available.

Server Wide Configuration

A server-wide configuration typically involves changing the way that PostgreSQL is started, such that the entire server runs with the virtual environment configured as needed. On Linux, this would normally mean modifying the systemd unit file that defines how PostgreSQL starts up. For example, when using the RPM installation of PostgreSQL 14 from yum.postgresql.org, we can see the unit file at /usr/lib/systemd/system/postgresql-14.service. This contains a line that looks like this:

ExecStart=/usr/pgsql-14/bin/postmaster -D ${PGDATA}

which defines the command that is run to launch PostgreSQL. We can change this to activate the virtual environment first, e.g.

ExecStart=/bin/bash -c "source /path/to/venv/bin/activate && /usr/pgsql-14/bin/postmaster -D ${PGDATA}"

Note that editing the unit file directly is not recommended; instead, use systemctl edit postgresql-14 or create a file with a .conf suffix under /etc/systemd/system/postgresql-14.service.d/ and override the ExecStart setting there. See the systemd.unit(5) manual page for more information.

Once the change has been made, run the following commands to update systemd and restart PostgreSQL:

$ sudo systemctl daemon-reload
$ sudo systemctl restart postgresql-14

Per-session configuration

On a development machine which may be used for a number of different projects it may be preferable to be able to activate a virtual environment as required, much as you would do when using Python outside of PostgreSQL. This is also possible, but does require one pre-requisite to keep things simple; the system installation of Python must include the virtualenv package, and the virtual environment must be created using it. This is because the virtualenv package will include a Python script for activating the Python environment in the environment you create, which the standard Python venv module does not include. Install virtualenv using a vendor supplied RPM or DEB package if possible, otherwise, run the following command to install it using pip:

$ sudo pip install virtualenv

Make sure you do not have a virtual environment activated when you run the command, otherwise it will be installed into that virtual environment instead!

The virtual environment for our project can now be created:

$ virtualenv /path/to/project/venv

Once that's done, the environment can be activated on the command line and any required packages installed, for example:

$ source /path/to/project/venv/bin/activate
$ pip install numpy pandas

In order to use the virtual environment within a PostgreSQL session, it must be activated much as you would do on the command line. To do that, we'll create a function in PostgreSQL (having first created the plpython3u extension if required):

ml=# CREATE EXTENSION plpython3u;
CREATE EXTENSION
ml=# CREATE OR REPLACE FUNCTION activate_python_venv(venv text)
  RETURNS void AS
$BODY$
    import os
    import sys

    if sys.platform in ('win32', 'win64', 'cygwin'):
        activate_this = os.path.join(venv, 'Scripts', 'activate_this.py')
    else:
        activate_this = os.path.join(venv, 'bin', 'activate_this.py')

    exec(open(activate_this).read(), dict(__file__=activate_this))
$BODY$
LANGUAGE plpython3u VOLATILE;
CREATE FUNCTION

We'll need to create (and execute) this function in any database in which we want to activate a virtual environment. To activate the virtual environment, simply call the function and pass it the path:

ml=# SELECT activate_python_venv('/path/to/project/venv');
SELECT

We can then run a simple test to ensure everything is working as expected:

ml=# DO LANGUAGE plpython3u $$
    import sys
    plpy.notice('pl/python3 Path: {}'.format(sys.path[0]))                                                                                                                                                           $$;                                                                                                                                                                                                              NOTICE:  pl/python3 Path: /path/to/project/venv/lib/python3.10/site-packages
DO

This is an anonymous block or function which executes the pl/python3 code between the $$ markers. In this case it shows us that the first element of Python's search path is the site-packages directory within our virtual environment.

Conclusion

By default, pl/python3 functions and procedures in PostgreSQL will use the system Python environment. In order to avoid pollution of that environment and to separate requirements for different projects we can utilise virtual environments, either on a server-wide or per-session basis.

Share this