Expanding WarehousePG Cluster on New Host(s) using gpexpand
WarehousePG is our fork of the formerly open-source Greenplum Database. WarehousePG offers a Massively Parallel Processing (MPP) architecture for high-performance, petabyte-scale data processing.
gpexpand utility helps in expanding an existing WarehousePG Database across new hosts or existing hosts in the system.
In this guide, I’ll walk through expanding a WarehousePG database cluster using the gpexpand utility by adding new segments hosts.
1. Prerequisites and Planning
Important: When expanding to new hosts, the number of new hosts must be greater than the number of primary segments per existing host. Otherwise, you'll hit this error:
gpexpand failed: Not enough hosts for spread mirroring. You must have more hosts than primary segments per hostExample Setup:
There are 2 total primaries on two existing hosts (sdw1, sdw2), i.e., 3 per host.
gpadmin=# select count(content),hostname from gp_segment_configuration where role='p' group by hostname ;
count | hostname
-------+----------
1 | mdw
2 | sdw2
2 | sdw1
(3 rows)So, to expand this cluster, you need at least 3 new hosts.
2. Segment Count Strategy
Current primary instances per host = 2.
As per the requirement, number of hosts > number of primaries per host so at least 3 new hosts are needed to expand this cluster.
By default, new hosts are initialized with same number of segments as current hosts.
You can optionally override this during
gpexpand.
With 3 new hosts and 2 primaries each (default), you'll get: 2 primaries on each host × 3 hosts = 6 new primaries
and their corresponding 6 mirrors → 12 new segment instances
By default, new hosts are configured with the same number of primary
segments as existing hosts. Optionally, you can increase the number
of segments per host.
For example, if existing hosts have two primary segments, entering a value
of 2 will initialize two additional segments on existing hosts, and four
segments on new hosts. In addition, mirror segments will be added for
these new primary segments if mirroring is enabled.That will result in adding 2 primaries per new host.
3. Prepare the Host File
[gpadmin@mdw ~]$ cat new_hosts.txt
sdw3
sdw4
sdw54. Run Initial gpexpand to Generate Input File
gpexpand -f new_hosts.txtYou’ll see an interactive prompt asking for additional expansion details such as the mirroring strategy and the number of primary segments per host(s) etc.
What type of mirroring strategy would you like? [spread|grouped] (default=grouped):
> spread
How many new primary segments per host do you want to add? (default=0):
>5. Review the Generated Input File
cat gpexpand_inputfile_20250606_224828Sample gpexpand_input_* file:
sdw3|sdw3|6000|/data1/primary/gpseg4|10|4|p
sdw4|sdw4|7000|/data2/mirror/gpseg4|18|4|m
sdw3|sdw3|6001|/data1/primary/gpseg5|11|5|p
sdw5|sdw5|7000|/data2/mirror/gpseg5|20|5|m
sdw4|sdw4|6000|/data1/primary/gpseg6|12|6|p
sdw5|sdw5|7001|/data2/mirror/gpseg6|21|6|m
sdw4|sdw4|6001|/data1/primary/gpseg7|13|7|p
sdw3|sdw3|7000|/data2/mirror/gpseg7|16|7|m
sdw5|sdw5|6000|/data1/primary/gpseg8|14|8|p
sdw3|sdw3|7001|/data2/mirror/gpseg8|17|8|m
sdw5|sdw5|6001|/data1/primary/gpseg9|15|9|p
sdw4|sdw4|7001|/data2/mirror/gpseg9|19|9|mNew segment = 12 includes ( 2 primary + 2 mirror ) = 4 * 3 hosts = 12.
Make sure gpadmin user has write permissions to create these data directories.
6. Start the Expansion
Start the expansion using the input file:
gpexpand -i gpexpand_inputfile_20250605_233226You will see logs confirming segment startup, catalog unlock, mirror sync, etc.
gpexpand -i gpexpand_inputfile_20250606_224828
20250606:22:51:54:008622 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250606:22:51:54:008622 gpexpand:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250606:22:51:54:008622 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
....
....
....
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg4 -l /data1/primary/gpseg4/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg5 -l /data1/primary/gpseg5/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg6 -l /data1/primary/gpseg6/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg7 -l /data1/primary/gpseg7/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg8 -l /data1/primary/gpseg8/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:22:008622 gpexpand:mdw:gpadmin-[INFO]:-SegmentStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /data1/primary/gpseg9 -l /data1/primary/gpseg9/log/startup.log -w -t 600 -o " -c gp_role=utility " start
20250606:22:52:24:008622 gpexpand:mdw:gpadmin-[INFO]:-Unlocking catalog
20250606:22:52:24:008622 gpexpand:mdw:gpadmin-[INFO]:-Unlocked catalog
20250606:22:52:24:008622 gpexpand:mdw:gpadmin-[INFO]:-Creating expansion schema
20250606:22:52:25:008622 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database postgres
20250606:22:52:26:008622 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database template1
20250606:22:52:27:008622 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database gpadmin
20250606:22:52:28:008622 gpexpand:mdw:gpadmin-[INFO]:-Starting new mirror segment synchronization
20250606:22:53:12:008622 gpexpand:mdw:gpadmin-[INFO]:-************************************************
20250606:22:53:12:008622 gpexpand:mdw:gpadmin-[INFO]:-Initialization of the system expansion complete.
20250606:22:53:12:008622 gpexpand:mdw:gpadmin-[INFO]:-To begin table expansion onto the new segments
20250606:22:53:12:008622 gpexpand:mdw:gpadmin-[INFO]:-rerun gpexpand
20250606:22:53:12:008622 gpexpand:mdw:gpadmin-[INFO]:-************************************************7. Redistribute Data
After the system is initialized, run gpexpand again. This starts data redistribution across new segments.
[gpadmin@mdw ~]$ gpexpand
20250606:22:55:02:008940 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250606:22:55:02:008940 gpexpand:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250606:22:55:02:008940 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20250606:22:55:03:008940 gpexpand:mdw:gpadmin-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20250606:22:55:03:008940 gpexpand:mdw:gpadmin-[INFO]:-Exiting...8. Verify Segment Status
Run gpstate to make sure all the segments are up.
[gpadmin@mdw ~]$ gpstate -e
20250606:22:55:45:008977 gpstate:mdw:gpadmin-[INFO]:-Starting gpstate with args: -e
20250606:22:55:45:008977 gpstate:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9'
20250606:22:55:45:008977 gpstate:mdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.1-WHPG-rc.1 build commit:165c56297075f86a39ce8ccbf912a0689e8b8df9) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-26), 64-bit compiled on May 8 2025 13:42:44 Bhuvnesh C. WarehousePG'
20250606:22:55:45:008977 gpstate:mdw:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
20250606:22:55:45:008977 gpstate:mdw:gpadmin-[INFO]:-Gathering data from segments...
..
20250606:22:55:48:008977 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20250606:22:55:48:008977 gpstate:mdw:gpadmin-[INFO]:-Segment Mirroring Status Report
20250606:22:55:49:008977 gpstate:mdw:gpadmin-[INFO]:-----------------------------------------------------
20250606:22:55:49:008977 gpstate:mdw:gpadmin-[INFO]:-All segments are running normally9. Verify New Segment Distribution
gpadmin=# SELECT COUNT(content), hostname
FROM gp_segment_configuration
WHERE role='p'
GROUP BY hostname
ORDER BY hostname; gpadmin=# select count(content),hostname from gp_segment_configuration where role='p' group by hostname order by hostname;
count | hostname
-------+----------
1 | mdw
2 | sdw1
2 | sdw2
2 | sdw3
2 | sdw4
2 | sdw5
(6 rows)You should see the 3 new hosts (sdw3–sdw5) with 2 primaries each.
In this example, we expanded the cluster by adding new segment hosts (sdw3, sdw4 and sdw5), confirmed their addition, and validated data redistribution.
Refer to this article for expanding WarehousePG cluster on existing hosts
To learn more about WarehousePG, visit our WarehousePG docs page or explore the WarehousePG GitHub repository.