Accelerating with Spark v1.6

Suggest edits

By default, the Postgres Analytics Accelerator (PGAA) utilizes Seafowl, an embedded analytical engine, to accelerate queries. However, for large-scale data processing that exceeds the resources of a single Postgres instance, you can offload execution to a remote Apache Spark cluster via Spark Connect.

Spark Connect is a thin client-server protocol for Apache Spark that decouples the application from the Spark driver. It acts as a high-speed bridge, allowing Postgres to send query instructions to a remote, distributed Spark cluster. This enables you to leverage the massive compute power of an external cluster without requiring Spark to run on the same machine as your database.

Choosing your executor engine

The pgaa.executor_engine configuration parameter determines where the heavy lifting of your analytical queries happens.

Feature	Seafowl	Spark Connect
Architecture	Runs as a process alongside Postgres.	Connects to an external Spark cluster.
Best for	Small to medium datasets, low latency.	Petabyte-scale data, heavy ETL/Z-Ordering.
Scalability	Limited by the host machine's RAM/CPU.	Distributed across multiple worker nodes.
Complexity	Zero-config; starts automatically.	Requires a running Spark Connect endpoint.
Performance	Faster for single-node data skipping.	Faster for massive joins and aggregations.

When to switch to Spark?

While Seafowl is highly optimized for performance on a single node, you should consider switching to Spark Connect if:

Memory constraints: Your aggregations or joins are hitting the pgaa.autostart_seafowl_max_memory_mb limit.
Maintenance heavy: You are performing resource-intensive operations like Z-Ordering or large-scale Compaction on Delta or Iceberg tables.
Centralized compute: You already have a managed Spark environment and want to leverage existing compute credits.

← Prev

Writing to object storage

↑ Up

Postgres Analytics Accelerator (PGAA)

Distributed Spark execution

Accelerating with Spark v1.6

Choosing your executor engine

When to switch to Spark?

← Prev

↑ Up

Next →