One of the great things about PostgreSQL's jsonb type is the flexibility it gives you — you can store whatever structure you need without defining columns up front. But that flexibility comes with a trade-off: there's nothing stopping bad data from getting in. You can slap a CHECK constraint on a jsonb column, but writing validation logic in SQL or PL/pgSQL for anything beyond the trivial gets ugly fast.
I've been working on a PostgreSQL extension called json_schema_validate that solves this problem by letting you validate JSON and JSONB data against JSON Schema specifications directly in the database. If you've used JSON Schema in application code before, it's the same idea — just running inside PostgreSQL.
Why validate in the database?
Application-level validation is fine until it isn't. You have multiple applications writing to the same table, or someone runs a manual INSERT, or a migration script skips the validation layer. The database is the one place everything has to pass through, so it's the natural place to enforce data shape.
With this extension, a CHECK constraint can enforce a full JSON Schema:
CREATE TABLE events (
id serial PRIMARY KEY,
data jsonb NOT NULL CHECK (
jsonschema_is_valid(data, '{
"type": "object",
"required": ["event_type", "timestamp"],
"properties": {
"event_type": {"type": "string", "enum": ["click", "view", "purchase"]},
"timestamp": {"type": "string", "format": "date-time"},
"user_id": {"type": "integer", "minimum": 1},
"metadata": {"type": "object"}
},
"additionalProperties": false
}'::jsonschema_compiled)
)
);That ::jsonschema_compiled cast is important — it tells PostgreSQL to parse the schema once and cache it, rather than re-parsing it on every row.
What it supports
The extension covers a large subset of JSON Schema Draft 7. The basics are all there — type validation, required properties, enum, const — but it goes well beyond that.
String validation includes minLength, maxLength, pattern (POSIX regular expressions), and format for common types like dates, email addresses, IP addresses, URIs, and UUIDs:
SELECT jsonschema_is_valid(
'"2024-01-15T09:30:00Z"',
'{"type": "string", "format": "date-time"}'
);
-- trueNumeric validation covers minimum, maximum, exclusiveMinimum, exclusiveMaximum, and multipleOf:
SELECT jsonschema_is_valid(
'{"price": 9.99}',
'{
"type": "object",
"properties": {
"price": {"type": "number", "exclusiveMinimum": 0, "multipleOf": 0.01}
}
}'
);
-- trueArray validation supports items, minItems, maxItems, uniqueItems, and the contains/minContains/maxContains family:
SELECT jsonschema_is_valid(
'["postgres", "mysql", "postgres"]',
'{"type": "array", "items": {"type": "string"}, "uniqueItems": true}'
);
-- false (duplicate "postgres")Object validation goes beyond just properties and required. You get additionalProperties, patternProperties for regex-matched property names, propertyNames to constrain the keys themselves, and minProperties/maxProperties:
SELECT jsonschema_is_valid(
'{"env_HOME": "/home/user", "env_PATH": "/usr/bin", "name": "config"}',
'{
"type": "object",
"patternProperties": {
"^env_": {"type": "string"}
}
}'
);
-- trueSchema composition with allOf, anyOf, oneOf, and not lets you combine schemas in the usual ways. And conditional schemas with if/then/else handle the cases where validation depends on the data:
SELECT jsonschema_is_valid(
'{"type": "business", "tax_id": "123-45-6789"}',
'{
"type": "object",
"if": {
"properties": {"type": {"const": "business"}},
"required": ["type"]
},
"then": {"required": ["tax_id"]},
"else": {"required": ["full_name"]}
}'
);
-- trueSchema reuse works with $ref and $defs (or the older definitions spelling), using JSON Pointer syntax for local references.
A note on JSON Schema versions
JSON Schema has gone through several drafts — Draft 7, 2019-09, and 2020-12 being the most widely used. The good news is that the core validation keywords (type, properties, required, string/number/array constraints, schema composition, if/then/else, $ref) are stable across all of them. The differences between drafts are mostly in advanced features like tuple validation (prefixItems replacing the array form of items), dynamic references ($dynamicRef), and evaluation tracking (unevaluatedProperties/unevaluatedItems) — none of which are implemented yet. So in practice, schemas written for any of these drafts will work with this extension, as long as they stick to the supported keywords.
One deliberate choice: the format keyword is treated as an assertion (invalid formats cause validation failure). In 2020-12, format is technically just an annotation by default, but assertion behavior is what most people actually want from a validation tool.
Two modes: boolean and errors
For CHECK constraints and simple validation, jsonschema_is_valid() returns a boolean. But when you need to know what failed, jsonschema_validate() returns a JSON array of error objects with the path and a description:
SELECT jsonschema_validate(
'{"name": 123, "tags": "not-an-array"}',
'{
"properties": {
"name": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}}
}
}'
);
-- [{"path": "name", "message": "Expected type string but got number"},
-- {"path": "tags", "message": "Expected type array but got string"}]This is useful for building validation into application logic or trigger functions where you want to return meaningful error messages to the caller.
Performance
The functions are marked IMMUTABLE and PARALLEL SAFE, so PostgreSQL can optimize queries that use them. Regex patterns used in pattern and patternProperties keywords are compiled once and cached in a session-level hash table, so repeated validation against the same schema doesn't recompile patterns.
For CHECK constraints specifically, casting the schema literal to jsonschema_compiled avoids re-parsing the schema JSON on every row insertion.
Both json and jsonb input types are supported, though jsonb will be faster since it's already in a parsed binary format.
Benchmarks vs. pg_jsonschema
The other well-known option for JSON Schema validation in PostgreSQL is pg_jsonschema, a Rust-based extension from Supabase. I was curious how the two compare, so I ran some benchmarks on PostgreSQL 17.2.
The test setup: a table with 100,000 rows of varied data (name, age, email, score fields), stored as both json and jsonb columns. Each test validates every row against a schema and counts the matches. All tests were run three times; the numbers below are representative. Times shown are wall-clock from \timing in psql.
Both extensions support validating json and jsonb data, so I tested both. Each extension uses its own preferred schema form: ::jsonschema_compiled for ours (pre-parsed, regex cached), and ::json for pg_jsonschema (which is all it supports). Our extension supports compiled schemas with both json and jsonb data, so the schema-side advantage applies regardless of data type.
Test 1: Property and type validation (jsonb data)
Schema checks object type, requires name and email, validates types for four properties (string, integer with minimum, string, number):
json_schema_validate: ~59 ms
pg_jsonschema: ~348 ms5.9x faster.
Test 2: Property and type validation (json data)
Same schema, but validating json data instead of jsonb:
json_schema_validate: ~80 ms
pg_jsonschema: ~334 ms4.2x faster. Slower than the jsonb case since the json text has to be parsed on every call, but the compiled schema still avoids re-parsing the schema side.
Test 3: Regex pattern matching (jsonb data)
Schema validates two properties with regex patterns — name must match ^user_[0-9]+$ and email must match ^[^@]+@[^@]+\.[^@]+$:
json_schema_validate: ~48 ms
pg_jsonschema: ~3502 ms73x faster. This is where the regex cache pays off. Our extension compiles each regex pattern once per session and stores it in a hash table. pg_jsonschema appears to recompile the pattern on every call.
Test 4: Regex pattern matching (json data)
json_schema_validate: ~71 ms
pg_jsonschema: ~3518 ms50x faster.
Test 5: Validation failure (jsonb data)
Schema requires a field (missing_field) that no row has, so every row fails validation:
json_schema_validate: ~16 ms
pg_jsonschema: ~148 ms9x faster on the failure path.
Summary
| Test | json_schema_validate | pg_jsonschema | Speedup |
|---|---|---|---|
| Property + required (jsonb) | 59 ms | 348 ms | 5.9x |
| Property + required (json) | 80 ms | 334 ms | 4.2x |
| Regex patterns (jsonb) | 48 ms | 3,502 ms | 73x |
| Regex patterns (json) | 71 ms | 3,518 ms | 50x |
| Validation failure (jsonb) | 16 ms | 148 ms | 9x |
Caveats
A few things to keep in mind before reading too much into these numbers:
- This was tested on a single machine (aarch64 Linux) with PostgreSQL 17.2. Results will vary on different hardware and configurations.
- Each extension uses its own best schema form:
::jsonschema_compiledfor ours,::jsonfor pg_jsonschema (which is all it supports). Bothjsonandjsonbdata were tested, so neither extension is disadvantaged by data format. - pg_jsonschema uses the jsonschema Rust crate, which is a full-featured, spec-compliant implementation. Our extension implements a subset of the spec in C, which gives it less overhead but also fewer features.
- pg_jsonschema supports a broader set of JSON Schema features, including some that we list as unimplemented (like
$id, external references, and vocabulary support). - The pgrx build system reported "dev profile [unoptimized + debuginfo]" for the SQL generation step, though the shared library itself was compiled with release optimizations. I'm not 100% certain this didn't affect the final binary.
The performance difference in the non-regex tests likely comes from our extension operating directly on PostgreSQL's internal representations in C, avoiding the overhead of converting to Rust/serde structures. With json data (where both extensions must parse the text on each call), we're still 2–3x faster, suggesting the advantage isn't just about data format. For the regex tests, the dominant factor is caching compiled patterns — compiling a regex is far more expensive than matching against one.
What's not there (yet)
A few JSON Schema features aren't implemented: prefixItems for tuple validation, dependentRequired/dependentSchemas, unevaluatedProperties/unevaluatedItems, and external $ref references. Only local #/... references are supported. For most practical use cases, the supported keyword set is plenty.
Getting it
The extension requires PostgreSQL 14 or later and builds with PGXS:
make PG_CONFIG=/path/to/pg_config
make install PG_CONFIG=/path/to/pg_configThen in your database:
CREATE EXTENSION json_schema_validate;The source is available on GitHub under the PostgreSQL License.