Installing FlowServer for WarehousePG

You can run the FlowServer service and the FlowCLI utility on any host that is able to reach your WarehousePG (WHPG) cluster. However, you must also install the packages on every host in your WHPG cluster.

Prerequisites

  • WarehousePG (WHPG) version 6.x running on RH7 or RH8.
  • WarehousePG version 7.x running on RH8 or RH9.

Network requirements

The following table lists the connection requirements among the different components:

SourceDestinationProtocol
FlowServerWarehousePG coordinatorlibpq
FlowServerWarehousePG segmentsHTTP
FlowServerKafka broker hosts / RabbitMQ hostsTCP
FlowCLIFlowServergRPC

Download and install the package on your WarehousePG cluster

  1. Download the package from the EDB repository:

    export EDB_SUBSCRIPTION_TOKEN=<your-token>
    export EDB_REPO=gpsupp
    curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
    sudo dnf download whpg<whpg_major_version>-flow-server

    Where <whpg_major_version> is your WHPG version (6 or 7).

  2. Create a file all_hosts on your WHPG coordinator, which lists all hosts in the WHPG cluster. For example:

    cdw
    scdw
    sdw1
    sdw2
    sdw3
  3. From the coordinator, use the gpssh utility to install the packages from the coordinator onto every other host in the cluster:

  4. (Optional) Create the FlowServer extension by connecting to a database on your WHPG cluster and running:

CREATE EXTENSION fs_formatter;

If you don't create the extension manually, it will be automatically created when a job starts.

Download and install the package on your dedicated FlowServer host / FlowCLI host (optional)

If you are running FlowServer on a different host to your WHPG cluster, or if you are planning in running FlowCLI commands from a different host, you must also download and install the packages on these hosts.

  1. Download the package from the EDB repository:

    export EDB_SUBSCRIPTION_TOKEN=<your-token>
    export EDB_REPO=gpsupp
    curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_REPO/setup.rpm.sh" | sudo -E bash
    sudo dnf download whpg<whpg_major_version>-flow-server
  2. Install the package on the FlowServer dedicated host:

Configure FlowServer

Create a configuration file flow_server.json on the host that will be running the FlowServer service and include the following content. The host might be the host within your WHPG cluster, or a dedicated server.

{
    "Host": "",
    "Port": 6060,
    "Gpfdist": {
        "Host": "",
        "Port": 6070,
        "ReuseTables": true
    },
    "Prometheus": {
        "Host": "",
        "Port": 9080,
        "MetricsPath": "/flow_metrics"
    },
    "DebugPort": 6080,
    "Logging": {
        "SplitLogByJob": false,
        "FrontendLevel": "debug",
        "BackendLevel": "info"
    }
}

Where:

  • Host: The hostname or IP address of the server. The default is an empty string, which means it will listen on all interfaces.
  • Port: The port number on which the server listens for incoming connections. The default is 6060.
  • Gpfdist: Configuration options for the gpfdist service.
    • Host: The hostname or IP address of the gpfdist service. The default is an empty string, which means it will listen on all interfaces.
    • Port: The port number on which the gpfdist service listens. The default is 6070.
    • ReuseTables: Whether to reuse existing tables in the database. The default is false. When you reuse external tables, FlowServer generates the external table name using a hash of various load configuration property values. By default, FlowServer drops the external table associated with a load operation (if one exists) and creates a new external table when you start or restart the job. If you don't reuse external tables, the external table name is based on the job name.
  • Prometheus: Configuration options for the Prometheus metrics endpoint.
    • Host: The hostname or IP address of the Prometheus service. The default is an empty string, which means it will listen on all interfaces.
    • Port: The port number on which the Prometheus service listens.
    • MetricsPath: The path to the metrics endpoint.
  • DebugPort: The port number for the debug server.
  • Logging: Configuration options for logging. The supported values are debug, info, warn, error, and fatal.
    • SplitLogByJob: Whether to split logs by job. The default is true, meaning logs will be separated by job.
    • FrontendLevel: The logging level for the frontend/stdout. The default is info.
    • BackendLevel: The logging level for the backend/log file. The default is debug.

Start the FlowServer service:

Once you have configured the settings, start the FlowServer service on your preferred host, pointing to the configuration file flow_server.json you just created:

./flowserver -c /path/flow_server.json

Could this page be better? Report a problem or suggest an addition!