Officially Greenplum Database Single Node Edition (SNE) is only installable on Red Hat Enterprise Linux (RHEL) and SUSE Linux Enteprise Server (SLES), but while surfing the web I have seen many requests on how to install it on Debian/Ubuntu. Here I’m trying to give you some advices.
Before installing Greenplum Database SNE, you need to adjust the following OS configuration parameters:
Set the following parameters in the `/etc/sysctl.conf` file:
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 64000 100 512
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_max_syn_backlog=4096
net.core.netdev_max_backlog=10000
vm.overcommit_memory=2
To activate such parameters you can either run `sudo sysctl -p` or reboot the system.
Set the following parameters in the `/etc/security/limits.conf` file:
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
In the file /etc/hosts comment out the line beginning with `::1`, as it could confuse the database when it resolves the hostname for localhost. Also make sure either localhost and your hostname is resolvable to a local address.
Now you have done preparing the environment for your Greenplum Database SNE. The next step is to create the user account designated to be the administrator of your installation, usually this user is called gpadmin.
sudo adduser –gecos “Greenplum Administrator” gpadmin
At this point you have to download or copy the installer file to the system. Installer files are available at [http://www.greenplum.com/products/single-node](http://www.greenplum.com/products/single-node). You should choose the RHEL installer for your architecture. I have a x86_64 so from now on I will use it as example.
To start the installation run the following commands (you need the unzip program installed):
unzip greenplum-db-3.3.6.1-build-1-SingleNodeEdition-RHEL5-x86_64.zip
sudo bash greenplum-db-3.3.6.1-build-1-SingleNodeEdition-RHEL5-x86_64.bin
Follow the on screen instructions. Accept the license and choose the installation path. The default one is fine. The installer will create a `greenplum-db` symbolic link one directory level above your chosen installation directory. The symbolic link is used to facilitate patch maintenance and upgrades between versions. From now on the install location will be referred to as `$GPHOME`.
Change the ownership of the installation so that it is owned by the gpadmin user and group.
sudo chown -R gpadmin:gpadmin $GPHOME
Now is the time to choose the data directory location, to explain how to choose nothing is better of quoting the official quick-start guide.
> Every Greenplum Database SNE instance has a designated storage area on disk that
> is called the data directory location. This is the file system location where the database
> data is stored. In the Greenplum Database SNE, you initialize a Greenplum Database
> SNE master instance and two or more segment instances on the same system, each
> requiring a data directory location. These directories should have sufficient disk space
> for your data and be owned by the gpadmin user.
>
> Remember that the data directories of the segment instances are where the user data
> resides, so they must have enough disk space to accommodate your planned data
> capacity. For the master instance, only the system catalog tables and system
> metadata are stored in the master data directory.
For this guide we will use the default layout, with the master (`/gpmaster`) and two segments (`/gpdata1` and `/gpdata2`).
sudo mkdir /gpmaster /gpdata1 /gpdata2
sudo chown gpadmin:gpadmin /gpmaster /gpdata1 /gpdata2
A `greenplum_path.sh` file is provided in your `$GPHOME` directory with environment variable settings for Greenplum Database SNE. You should source this in the gpadmin user’s startup shell profile (such as `.bashrc`) adding a line like the following:
source /usr/local/greenplum-db/greenplum_path.sh
Before to continue we should do some magics to avoid failures running programs from Ubuntu with libraries shipped by Greenplum SNE.
#!/bin/sh
cd $GPHOME/lib
# libraries shipped with Greenplum SNE
gplibs=”$(find -maxdepth 1 -type f | cut -f 2 -d /)”
# libraries with same abi installed via dpkg
deblibs=”$(dpkg -S $gplibs 2> /dev/null | cut -f 2 -d ‘ ‘)”
# we remove the greenplum one to avoid “no version information available” errors
for lib in $deblibs; do
rm -f $(basename $lib)
done
For your convenience you can find the script attached to this guide.
fixlibs.sh
It’s now time to initialize the database system, all the following steps are to be executed as gpadmin user.
su – gpadmin
cp $GPHOME/docs/cli_help/single_hostlist_example ./single_hostlist
cp $GPHOME/docs/cli_help/gp_init_singlenode_example ./gp_init_singlenode
If you do not want to use the default configuration, data directory locations, ports, or other configuration options, edit the `gp_init_singlenode` file and enter your configuration settings.
Run the gpssh-exkeys utility to exchange ssh keys for the local host:
gpssh-exkeys -h 127.0.0.1 -h localhost
Run the following command to initialize the database:
gpinitsystem -c gp_init_singlenode
The utility verifies your setup information and makes sure that the data directories specified in the `gp_init_singlenode` configuration file are accessible. If all of the verification checks are successful, the utility prompts you to confirm the configuration before creating the system.
At the end of a successful setup, the utility starts your system. You should see:
=> Greenplum Database instance successfully created.
The management utilities require that you set the `MASTER_DATA_DIRECTORY` environment variable. This should specify the directory created by the gpinitsystem utility in the master data directory location.
echo “export MASTER_DATA_DIRECTORY=/gpmaster/gpsne-1” >> ~/.bashrc
source ~/.bashrc
Now you can connect the master database using the psql client program:
psql postgres
I would remark to you that a system installed following this guide is to be considered as **evaluation platform only**, and is not supposed to be for production installations of Greenplum Database.