Encrypting columns in PGD v6.3.1
Column encryption protects sensitive fields in a table by converting them into ciphertext, a scrambled and unreadable form, while leaving the rest of the record in its original, human-readable form. Because encryption happens at the SQL level during INSERT or UPDATE, only ciphertext ever reaches the Write-Ahead Log (WAL) and gets replicated across PGD nodes, keeping sensitive data protected at rest, in backups, and in transit.
In a distributed cluster, column encryption also provides the following benefits:
- Node independence: Applications can provide the key to any node for consistent decryption.
- Reduced attack surface: Keys never enter the WAL in plaintext. Only ciphertext is replicated.
- Memory isolation: Keys are cleared when the session closes.
PGD supports column-level encryption through the following:
- pgcrypto: A community Postgres extension, not shipped as an EDB package, but supported when used with PostgreSQL, EDB Postgres Extended (PGE), and EDB Postgres Advanced Server (EPAS).
- DBMS_CRYPTO: An Oracle-compatibility package available exclusively on EPAS.
Both are fully compatible with PGD replication. When using pgcrypto or DBMS_CRYPTO, data is transformed using cryptographic functions before being written to the heap. pgcrypto stores the resulting ciphertext as bytea, and DBMS_CRYPTO uses RAW, BLOB, or CLOB, all of which PGD can replicate natively.
Comparing encryption methods
Column encryption is one of several approaches to protecting data in PGD. Each operates at a different level of the stack and guards against different threats.
Transparent Data Encryption (TDE) is an optional feature of PGE and EPAS that encrypts data at the storage level, making database files on disk unreadable without the encryption key. TDE operates transparently alongside PGD replication, but it offers no protection against a DBA who has access to the running database, who can still query all data in plaintext.
Data redaction is a feature of EPAS that masks sensitive values in query results based on policy. The underlying data on disk is unchanged, so it must be paired with TDE or column encryption to provide meaningful protection.
Column encryption protects at the field level, which is more granular than either of the above. A DBA without the session-level decryption key sees only ciphertext, even in a live query. The trade-off is that the application must manage keys and handle encryption and decryption explicitly.
| Technique | Protection level | Key managed by | Protects against | Visible to DBAs |
|---|---|---|---|---|
| TDE | Block/storage-level | Database/KMIP | Physical disk theft or OS-level access | Yes |
| Data redaction | Presentation-level | Database policy | Unauthorized app users | Yes |
| Column encryption | Field-level | Application | Compromised DBAs or system admins | No (ciphertext only) |
Planning for encryption in a distributed cluster
Encrypting columns in PGD introduces considerations that don't arise in a single-node setup. Before implementing column encryption, make sure you've accounted for indexing, CPU overhead, and the effect on storage and replication.
Handling indexing and search on encrypted columns
Both pgcrypto and DBMS_CRYPTO use probabilistic (randomized) encryption by default, meaning encrypting the same plaintext multiple times produces a different ciphertext each time.
pgcrypto achieves this behavior via pgp_sym_encrypt(), and DBMS_CRYPTO does this via AES in CBC mode.
Randomized encryption has two key implications:
- Index incompatibility: A standard B-tree index on a ciphertext column can't support plaintext equality lookups.
- Query failure: The following common pattern returns zero results even if the record exists, because each call to
pgp_sym_encrypt()produces different ciphertext that will never match what's stored in the column. No error is raised, making the failure easy to mistake for the record not existing:
SELECT * FROM customers WHERE email = pgp_sym_encrypt('alice@example.com', 'your_key');
With pgcrypto specifically, if the column type is text rather than bytea, Postgres raises a type mismatch error instead:
ERROR: operator does not exist: text = bytea
To search encrypted columns by exact value, use one of these strategies:
- Deterministic encryption: Configure the encryption algorithm to always produce the same ciphertext for a given plaintext.
- Blind index (recommended): Store an additional column containing a one-way cryptographic hash of the plaintext, using a function like
hmac()from pgcrypto, and index that column instead. Queries hash the search value and compare against the index rather than the encrypted column.
For implementation details on both approaches, see the pgcrypto documentation or the DBMS_CRYPTO reference.
Warning
Both approaches reveal equality patterns: an attacker who can observe the stored values can infer when two rows contain the same data, and on low-variety fields like status flags or country codes, can deduce the plaintext entirely.
Accounting for CPU overhead
Encryption and decryption are computationally expensive operations that run on the database server. Ensure your nodes have sufficient CPU headroom. You can significantly reduce this impact by using algorithms that leverage hardware acceleration, such as AES on processors that support the AES-NI instruction set.
Planning for storage and WAL growth
Ciphertext is larger than the original plaintext, which increases table size, WAL volume, and the replication bandwidth required for PGD to synchronize nodes. Plan capacity accordingly.
Encrypting columns using pgcrypto
pgcrypto is a community Postgres extension that provides cryptographic functions including pgp_sym_encrypt() and pgp_sym_decrypt().
It works with PostgreSQL, PGE, and EPAS.
Views handle transparent decryption and triggers handle automatic encryption on write, so applications can read and write plaintext without calling the cryptographic functions directly.
Create the extension:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
Create the
customerstable with the encrypted column defined asbytea:CREATE TABLE customers ( customer_id uuid DEFAULT uuidv7() PRIMARY KEY, username TEXT NOT NULL, email BYTEA NOT NULL );
Create a view for transparent decryption:
-- Note: requires SET app.pgc_key = 'your_key' to be run in the session first CREATE OR REPLACE VIEW customers_view AS SELECT customer_id, username, pgp_sym_decrypt(email, current_setting('app.pgc_key')) AS email FROM customers;
Create a trigger to encrypt data automatically before it's written:
CREATE OR REPLACE FUNCTION encrypt_customer_pii_trigger() RETURNS TRIGGER AS $$ BEGIN IF current_setting('app.pgc_key', true) IS NULL OR current_setting('app.pgc_key') = '' THEN RAISE EXCEPTION 'Encryption key not set. Please run SET app.pgc_key = ...'; END IF; IF (TG_OP = 'INSERT' OR NEW.email != OLD.email) THEN NEW.email := pgp_sym_encrypt( convert_from(NEW.email, 'utf8'), current_setting('app.pgc_key'), 'cipher-algo=aes256' ); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_encrypt_customer_pii BEFORE INSERT OR UPDATE ON customers FOR EACH ROW EXECUTE FUNCTION encrypt_customer_pii_trigger();
With the view and trigger in place, set the session key and interact with the table normally:
-- 1. Initialize session key SET app.pgc_key = 'your_secret_key'; -- 2. Trigger handles encryption automatically INSERT INTO customers (username, email) VALUES ('dolores', 'dolores@example.com');
Querying the raw customers table reveals that the email column stores ciphertext, while customers_view returns the decrypted plaintext:
SELECT customer_id, username, email FROM customers;
customer_id | username | email -------------------------------------+----------+------------------------------------------------ 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores | \xc30d04070302e4a7f23b8c12d956aa... (1 row)
SELECT customer_id, username, email FROM customers_view;
customer_id | username | email -------------------------------------+----------+--------------------- 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores | dolores@example.com (1 row)
Encrypting columns using DBMS_CRYPTO
DBMS_CRYPTO is an Oracle-compatible package available exclusively in EPAS.
It provides encryption and decryption functions that mirror the Oracle DBMS_CRYPTO API.
As with pgcrypto, a view handles transparent decryption and a trigger handles automatic encryption on write.
Create the
customerstable with the encrypted column defined asRAW:CREATE TABLE customers ( customer_id uuid DEFAULT uuidv7() PRIMARY KEY, username TEXT NOT NULL, email RAW(128) );
Create a view for transparent decryption:
CREATE OR REPLACE VIEW customers_view AS SELECT customer_id, username, UTL_I18N.RAW_TO_CHAR( DBMS_CRYPTO.DECRYPT( src => email, typ => 258 + 256 + 4096, -- AES256 + CBC + PKCS7 key => UTL_I18N.STRING_TO_RAW(current_setting('app.pgc_key'), 'AL32UTF8') ), 'AL32UTF8' ) AS email FROM customers;
Create a trigger to encrypt data automatically before it's written:
CREATE OR REPLACE FUNCTION epas_encrypt_customer_pii_trigger() RETURNS TRIGGER AS $$ DECLARE l_typ INTEGER := 258 + 256 + 4096; -- AES256 + CBC + PKCS7 l_key RAW(32) := UTL_I18N.STRING_TO_RAW(current_setting('app.pgc_key'), 'AL32UTF8'); BEGIN IF (TG_OP = 'INSERT' OR NEW.email IS DISTINCT FROM OLD.email) THEN NEW.email := DBMS_CRYPTO.ENCRYPT( src => UTL_I18N.STRING_TO_RAW(convert_from(NEW.email, 'utf8'), 'AL32UTF8'), typ => l_typ, key => l_key ); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER trg_epas_encrypt_customer_pii BEFORE INSERT OR UPDATE ON customers FOR EACH ROW EXECUTE FUNCTION epas_encrypt_customer_pii_trigger();
With the view and trigger in place, set the session key and interact with the table normally:
-- 1. Initialize session key SET app.pgc_key = 'my_secure_32_byte_key_1234'; -- 2. Trigger handles encryption automatically INSERT INTO customers (username, email) VALUES ('dolores', 'dolores@example.com');
Querying the raw customers table reveals that the email column stores ciphertext as a RAW hex string, while customers_view returns the decrypted plaintext:
SELECT customer_id, username, email FROM customers;
customer_id | username | email -------------------------------------+----------+------------------------------------------------ 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores | A3F2C8D14E7B0591... (1 row)
SELECT customer_id, username, email FROM customers_view;
customer_id | username | email -------------------------------------+----------+--------------------- 01966e42-1af7-7a2a-b4b9-9cf8e0d3c2a | dolores | dolores@example.com (1 row)
Managing encryption keys securely
PGD replicates data but not keys, which means all nodes must have access to the same key material. Follow these practices to manage keys safely:
Avoid hardcoding keys: Never include keys in SQL scripts or
CREATE VIEWdefinitions, as they're stored inpg_catalogand are visible to anyone with access to system catalogs.Use a session-level configuration parameter to pass the key: Setting it at the session level means the key exists only in memory and is cleared when the session closes. The parameter name can be anything in the
app.*namespace. For example,app.pgc_key:SET app.pgc_key = 'your_ultra_secure_aes_256_key_here';
Important
Configuration parameters are local to the specific node and session where the command was executed. Because PGD operates as a multi-node cluster, your application must ensure the
SETcommand is issued to the specific node handling the current connection. If your application switches nodes (for example, via Connection Manager), the parameter must be re-set on the new node before attempting to read or write encrypted data.Use an external key management service for production deployments: Services such as HashiCorp Vault or AWS KMS let the application fetch the key at runtime and provide it to PGD for the duration of the session, so keys are never persisted to local disk on any PGD node.