Expanding WarehousePG Cluster on existing hosts using gpexpand
WarehousePG is our fork of the formerly open-source Greenplum Database. WarehousePG offers a Massively Parallel Processing (MPP) architecture for high-performance, petabyte-scale data processing.
What is gpexpand utility?
gpexpand utility helps in expanding an existing WarehousePG Database across new hosts or existing hosts in the system.
In this guide, I’ll walk through expanding a WarehousePG database cluster using the gpexpand utility by adding new segments on existing segment hosts.
When you run the gpexpand utility, it runs in interactive mode and this helps in creating the new expansion schema.
In this example, I am going to add 2 more segments to the existing hosts:
Step 1: Start the gpexpand Utility
[gpadmin@mdw logs]$ gpexpand
20250604:20:59:20:051060 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250604:20:59:21:051060 gpexpand:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250604:20:59:21:051060 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20250604:20:59:21:051060 gpexpand:mdw:gpadmin-[WARNING]:-One or more segments are either down or not in preferred role.
...
...
Would you like to initiate a new System Expansion Yy|Nn (default=N):
> yStep 2: Choose Hosts and Mirroring Strategy
Since we're adding segments to existing hosts, we still input the hostnames (even though they're already part of the cluster).
I have two segment hosts : sdw1 and sdw2.
Enter a comma separated list of new hosts you want
to add to your array. Do not include interface hostnames.
**Enter a blank line to only add segments to existing hosts**[]:
> sdw1, sdw2In this example, I am going to use the existing hosts to expand my cluster.
For mirroring strategy, I chose spread mirroring:
What type of mirroring strategy would you like?
spread|grouped (default=grouped):
> spreadNext, specify how many new primary segments to add per host:
I want to add one primary segment per host. Here I am specifying the data directories for the new primary and mirror.
How many new primary segments per host do you want to add? (default=0):
> 1
Enter new primary data directory 1:
> /data1/primary
Enter new mirror data directory 1:
> /data1/mirrorAfter the above step, it generates an expansion file:
Generating configuration file...
..
20250604:21:00:29:051097 gpexpand:mdw:gpadmin-[INFO]:-Generating input file...
Input configuration file was written to 'gpexpand_inputfile_20250604_210029'.
Please review the file and make sure that it is correct then re-run
with:
gpexpand -i gpexpand_inputfile_20250604_210029Step 3: Review the Generated Expansion File
[gpadmin@mdw logs]$ cat gpexpand_inputfile_20250604_210029
sdw1|sdw1|6006|/data1/primary/gpseg12|26|12|p
sdw2|sdw2|7006|/data1/mirror/gpseg12|29|12|m
sdw2|sdw2|6006|/data1/primary/gpseg13|27|13|p
sdw1|sdw1|7006|/data1/mirror/gpseg13|28|13|mThis file defines:
Two new primary segments (
p) — one on each host.Two mirror segments (
m) — mirrored across the other host.
Step 4: Check Current Segment Distribution
Before proceeding, let’s verify the number of segments per host:
gpadmin=# select count(*) from gp_segment_configuration group by hostname;
count
-------
1
12
12
(3 rows)Every host has 12 segments each. Now after running the expansion, this number will be 14 ( Adding one primary and one mirror ) on each host.
Step 5: Run the Expansion
gpexpand -i gpexpand_inputfile_20250604_210029Output:
20250604:21:21:08:051698 gpexpand:mdw:gpadmin-[INFO]:-Heap checksum setting consistent across cluster
20250604:21:21:08:051698 gpexpand:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions
20250604:21:21:08:051698 gpexpand:mdw:gpadmin-[INFO]:-did not find gppkg binary. skip package sync
20250604:21:21:09:051698 gpexpand:mdw:gpadmin-[INFO]:-Locking catalog
20250604:21:21:09:051698 gpexpand:mdw:gpadmin-[INFO]:-Locked catalog
...
...
20250604:21:21:22:051698 gpexpand:mdw:gpadmin-[INFO]:-Creating expansion schema
20250604:21:21:23:051698 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database postgres
20250604:21:21:24:051698 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database template1
20250604:21:21:25:051698 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database gpadmin
20250604:21:21:25:051698 gpexpand:mdw:gpadmin-[INFO]:-Starting new mirror segment synchronization
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-************************************************
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-Initialization of the system expansion complete.
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-To begin table expansion onto the new segments
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-rerun gpexpand
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-************************************************
20250604:21:21:48:051698 gpexpand:mdw:gpadmin-[INFO]:-Exiting...Step 6: Run gpexpand Again for Table Expansion
As per the message above, we need to run it one more time for table expansion:
[gpadmin@mdw ~]$ gpexpand
20250604:21:22:09:051967 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250604:21:22:09:051967 gpexpand:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250604:21:22:09:051967 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20250604:21:22:10:051967 gpexpand:mdw:gpadmin-[INFO]:-Expanding gpadmin.public.test
20250604:21:22:10:051967 gpexpand:mdw:gpadmin-[WARNING]:-Encountered unexpected issue when expanding table gpadmin.public.test, skipping
20250604:21:22:10:051967 gpexpand:mdw:gpadmin-[INFO]:-Resetting status_detail for gpadmin.public.test
20250604:21:22:11:051967 gpexpand:mdw:gpadmin-[INFO]:-Expanding gpadmin.public.test1
20250604:21:22:11:051967 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding gpadmin.public.test1
20250604:21:22:15:051967 gpexpand:mdw:gpadmin-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20250604:21:22:15:051967 gpexpand:mdw:gpadmin-[INFO]:-Exiting...Step 7: Verify Cluster Health with gpstate
Run `gpstate` to see everything is fine:
[gpadmin@mdw ~]$ gpstate -e
20250604:21:29:16:052215 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20250604:21:29:16:052215 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250604:21:29:16:052215 gpstate:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250604:21:29:16:052215 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20250604:21:29:16:052215 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
..
20250604:21:29:20:052215 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20250604:21:29:20:052215 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20250604:21:29:21:052215 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20250604:21:29:21:052215 gpstate:mdw:gpadmin-[INFO]:-All segments are running normallyStep 8: Confirm Segment Count
gpadmin=# select count(*) from gp_segment_configuration group by hostname;
count
-------
1
14
14
(3 rows)Step 9: Verify Data Distribution
We have the new segments added to the existing cluster. I have only one table and data is redistributed to the new segments ( id 12 and 13).
gpadmin=# select count(*) , gp_segment_id from test1 group by gp_segment_id;
count | gp_segment_id
-------+---------------
709 | 5
724 | 13
727 | 1
726 | 6
711 | 2
670 | 8
687 | 0
771 | 3
701 | 12
683 | 9
724 | 10
718 | 11
743 | 7
706 | 4
(14 rows)In this example, we expanded the cluster by adding segments on existing hosts (sdw1, sdw2), confirmed their addition, and validated data redistribution.
Expanding WarehousePG on New hosts? Follow this article: WarehousePG cluster on New hosts
To learn more about WarehousePG, visit our WarehousePG docs page or explore the WarehousePG GitHub repository.