gpmapreduce v6.27.4
Runs WarehousePG MapReduce jobs as defined in a YAML specification document.
Note WarehousePG MapReduce is deprecated and will be removed in a future WarehousePG release.
Synopsis
gpmapreduce -f <config.yaml> [dbname [<username>]]
[-k <name=value> | --key <name=value>]
[-h <hostname> | --host <hostname>] [-p <port>| --port <port>]
[-U <username> | --username <username>] [-W] [-v]
gpmapreduce -x | --explain
gpmapreduce -X | --explain-analyze
gpmapreduce -V | --version
gpmapreduce -h | --help Requirements
The following are required prior to running this program:
- You must have your MapReduce job defined in a YAML file. See gpmapreduce.yaml for more information about the format of, and keywords supported in, the WarehousePG MapReduce YAML configuration file.
- You must be a WarehousePG superuser to run MapReduce jobs written in untrusted Perl or Python.
- You must be a WarehousePG superuser to run MapReduce jobs with
EXECandFILEinputs. - You must be a WarehousePG superuser to run MapReduce jobs with
GPFDISTinput unless the user has the appropriate rights granted.
Description
MapReduce is a programming model developed by Google for processing and generating large data sets on an array of commodity servers. WarehousePG MapReduce allows programmers who are familiar with the MapReduce paradigm to write map and reduce functions and submit them to the WarehousePG parallel engine for processing.
gpmapreduce is the WarehousePG MapReduce program. You configure a WarehousePG MapReduce job via a YAML-formatted configuration file that you pass to the program for execution by the WarehousePG parallel engine. The WarehousePG cluster distributes the input data, runs the program across a set of machines, handles machine failures, and manages the required inter-machine communication.
Options
-f config.yaml
Required. The YAML file that contains the WarehousePG MapReduce job definitions. Refer to gpmapreduce.yaml for the format and content of the parameters that you specify in this file.
-? | --help
Show help, then exit.
-V | --version
Show version information, then exit.
-v | --verbose
Show verbose output.
-x | --explain
Do not run MapReduce jobs, but produce explain plans.
-X | --explain-analyze
Run MapReduce jobs and produce explain-analyze plans.
-k | --keyname=value
Sets a YAML variable. A value is required. Defaults to "key" if no variable name is specified.
Connection Options
-h host | --host host
Specifies the host name of the machine on which the WarehousePG coordinator database server is running. If not specified, reads from the environment variable
PGHOSTor defaults to localhost.-p port | --port port
Specifies the TCP port on which the WarehousePG coordinator database server is listening for connections. If not specified, reads from the environment variable
PGPORTor defaults to 5432.-U username | --username username
The database role name to connect as. If not specified, reads from the environment variable
PGUSERor defaults to the current system user name.-W | --password
Force a password prompt.
Examples
Run a MapReduce job as defined in my_mrjob.yaml and connect to the database mydatabase:
gpmapreduce -f my_mrjob.yaml mydatabase