One of the main advantages using Greenplum is that it gains power when it uses multiple nodes.
Horizontal scalability is a main feature of Greenplum.
Here is a compact handbook to install a multi-node Data Warehouse environment with Greenplum.
## Preparation steps
This little guide covers Greenplum 4.1 installation.
This is not intended to be a replacement for the official Install Guide, just a little handbook to keep on your desk.
You have to tune your Operating System a little bit before installing Greenplum.
That’s a very well documented procedure, I advice you to read it in the Install Guide at page 18.
## Installing Greenplum
First of all, you have to run the Greenplum installer script on master host, as root
.
The installer script can be downloaded from greenplum community site: http://www.greenplum.com/community/downloads/database-ce
Make sure to download the correct version!
The installer script displays some question and the license, simply follow instructions on video.
Now comes the important part, you have tu run a special script, that setup Greenplum on a list of hosts for you. Awesome!
It simply copies the Greenplum installation from the actual host to a list of specified hosts (it cares about ssh keys exchanging and gpadmin
user creation).
*Specified where?*
The important file here is hostfile_exkeys
, it must contains hostnames for each host in your Greenplum system. For example:
master-hostname
master-segment-hostname
segment-hostname-1
segment-hostname-2
...
this is enough to run gpseginstall
, run in this way:
# gpseginstall -f hostfile_exkeys -u gpadmin -p yourpassword
## Creating directories
It’s time to create the master
directory on master host.
Remember that real data are on segments, so no much space is needed here.
For example:
# mkdir /data/master
# chown gpadmin /data/master
You have to create that directory on your master segment as well.
Greenplum provides a useful script to do the job, it is called gpssh
:
# gpssh -h master-segment-hostname -e 'mkdir /data/master'
# gpssh -h master-segment-hostname -e 'chown gpadmin /data/mast
Finally, you have to create data directories on all segments host, and tou can do that
all at once, thanks to gpssh
.
Remember that real data goes there, so a lot of space is needed.
Create a file called
hostfile_gpssh_segonly
and place *only* segments hostnames in it. For example:
segment-hostname-1
segment-hostname-2
Now, run commands an all segments at once like this:
# gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/mirror'
# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/mirror'
## Conclusions
Here’s a list of steps to keep on your desk, I hope you will find it useful:
* Configure your Operating System for Greenplum (as written in Install Guide)
* Install Greenplum on master host
* Run gpseginstall
to install Greenplum on other hosts
* Create master directory on the master
* Create the same directory on master segment (gpssh
can help here)
* Create data directories on segments (gpssh
can help here)
In the next article, we will see how to init and start the Greenplum Database we have just installed.
Stay tuned.
Cheers