Getting started v1.7

Suggest edits

To implement tiered storage within a distributed environment, you must first establish a consistent link between your EDB Postgres Distributed (PGD) cluster and your object store.

The following sections guide you through the prerequisites and the configuration steps required to point your cluster to a storage provider. These steps define the external location where a version of your PGD data is persisted in a columnar format.

By configuring these settings at the node-group level, you ensure that every node in the cluster knows exactly where to write and read tiered data.

Prerequisites

Cluster: PGD version 6.1 or later with PGAA and PGFS extensions installed.
Storage locations: Local, S3, GCP, or Azure storage using Iceberg or Delta Lake formats.
Catalog: If using an external catalog service, only Iceberg REST catalogs are supported.
Permissions: The database user must have CREATE, ALTER, and EXECUTE privileges for the PGD and PGAA functions.

Note

The credentials associated with a storage location or a catalog service must have both read and write permissions for the destination bucket. You can verify permissions by running the following functions, which return NULL if successful.

For a storage location:

SELECT pgaa.test_storage_location ('my_storage_location', true);

For a catalog service:

SELECT pgaa.test_catalog('my_iceberg_catalog', test_writes := FALSE);

Pointing to object storage

Before creating a tiered table, you must define the destination for the cold data. Because you are working within a distributed PGD cluster, you must wrap these configuration calls in bdr.replicate_ddl_command to ensure the settings are propagated to all nodes in the group.

Tip

To view your node group, use SELECT node_group_id FROM bdr.node_group;.

Using a storage location
Using a catalog service

Create a storage location using the PGFS extension and set it as the analytics storage for your node group.

SELECT bdr.replicate_ddl_command($$
  SELECT pgfs.create_storage_location(
  'my-storage-location',
  'protocol://my-bucket-name/path/',
  '{"region": "region-name"}',
  '{"access_key_id": "...", "secret_access_key": "..."}'
  )
$$);

Clear analytics_write_catalog. This must be NULL before an analytics_storage_location can be assigned:
```
SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', NULL);
```

Set the analytics storage location for your node group:

SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_storage_location', 'my-storage-location');

Configure a connection to an external catalog:

SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.add_catalog(
    'my-catalog-name',
    'iceberg-rest',
    '{
      "url": "https://my-catalog-rest-endpoint.com",
      "token": "MY_AUTH_TOKEN",
     }'
  )
);

Set the catalog as the default analytics write location:

SELECT bdr.replicate_ddl_command($$  
  SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', 'my-catalog-name')  
);

Next steps

Now that you have defined the storage target for your PGD cluster, you can determine how your tables interact with that storage based on your requirements for performance and capacity:

Implement tiered tables: Establish a zero-touch data lifecycle by automatically transitioning partitions from local heap storage to your analytics storage targed based on an age threshold.
Replicate to analytics: Maintain a local transactional heap table while simultaneously synchronizing a columnar copy to your storage target for heavy analytical processing.
Offload to analytics: Reclaim heap disk space from existing tables which are being replicated to analytics.

← Prev

Replicating with PGD

↑ Up