Troubleshooting common issues

Address errors encountered within the portal to ensure continuous access to the management suite. Contact your system administrator if the resolution requires infrastructure-level changes.

Performing system diagnostics

Use the built-in command-line tools on the WarehousePG Enterprise Manager (WEM) host to identify configuration errors or connectivity gaps.

Validate active configuration settings

Run the setup verification tool to ensure your current environment variables and database strings are functional:

wem setup --verify

See wem setup command reference for details.

Perform a comprehensive health audit

Check for missing dependencies, incorrect file permissions, or service-level connectivity issues:

wem doctor

See wem doctor command reference for details.

Check WEM logs

For service-level events (startup failures, restarts):

sudo journalctl -u wem -n 50 --no-pager

For application-level logs (alert evaluation, canary checks, query activity):

sudo tail -f /var/log/wem/wem.log

Connectivity issues

Error: "Failed to connect to WEM database" during setup

When running wem setup or wem setup --verify, the command fails with an error similar to:

Setup failed: failed to connect to WEM database: failed to ping database: failed to connect to `user=gpadmin database=wem`

Cause: The WHPG_HOST or WEM_HOST environment variables aren't set in the current session, or the WarehousePG database isn't running.

Solution:

  1. Export the required host variables before running the command:

    export WHPG_HOST=<coordinator-hostname>
    export WEM_HOST=<coordinator-hostname>
    wem setup --verify
  2. If the database itself is not running, start it first:

    gpstart -a

    Then re-run wem setup --verify.

Issue: WEM service failed to start

Running sudo systemctl start wem returns an error:

Job for wem.service failed because the control process exited with error code.
See "systemctl status wem.service" and "journalctl -xe" for details.

Cause: WEM can't reach the WarehousePG database at startup.

Solution:

  1. Check whether WarehousePG is running:

    gpstate
  2. If the database is down, start it:

    gpstart -a
  3. Start WEM again and verify the status:

    sudo systemctl start wem
    sudo systemctl status wem

Issue: Can't connect to the database

If WEM is unable to reach the WarehousePG cluster from the portal:

  1. Ensure the database is active and accepting local connections:

    psql -d postgres -c "SELECT version();"
  2. Verify that the WEM connection strings are correctly set in the environment:

    env | grep WHPG
  3. Use the built-in WEM tool from the WEM host to validate the current configuration:

    wem setup --verify
  4. Test the credentials directly via the CLI using the same parameters defined in the WEM Settings tab within the Management panel.

    PGHOST=localhost PGUSER=gpadmin psql -d postgres -c "SELECT current_database();"

Authentication and access issues

Message: "Session expired"

Cause: Your security token has timed out due to a period of inactivity.

Solution: Select Log In to return to the authentication screen and re-enter your credentials.

Note

Any unsaved changes in forms or the query editor will be lost upon session expiration. Regularly save your configuration changes and avoid long periods of idle time with the browser tab open.

Error: "Permission denied"

Cause: Your assigned role doesn't have the authorization required to perform the requested action.

Solution:

  1. Verify your current role in the top right bar.
  2. Review the Role permissions matrix to confirm if the action is permitted for your role.
  3. If you require elevated access, contact your administrator to request a role change.

Query editor restrictions

Issue: Query is blocked

Symptoms:

  • "Query blocked" error messages.
  • Inability to execute INSERT, UPDATE, or DELETE statements.
  • DDL commands (CREATE, DROP) are rejected.

Cause: WEM enforces role-based SQL restrictions to prevent accidental data loss or unauthorized schema changes. Review the Role permissions matrix to confirm if the action is permitted for your role.

Observability and metrics

Issue: Charts aren't displaying

Some tabs display the error Prometheus not configured. Set PROMETHEUS_URL to enable metrics charts.

Cause: The connection to the Prometheus metrics server is either not correctly configured or down.

Solution:

  1. Verify that Prometheus is running and reachable via the URL defined in the Settings tab within the Management panel.
  2. Check network connectivity and firewall rules between the WEM server and the Prometheus endpoint.

Issue: Charts are empty despite Prometheus being configured

Prometheus is reachable but WEM charts show no data.

Cause: Metrics are not flowing from the Collector to Prometheus. The Collector may not be running, may not be configured correctly, or the ./deploy-observability script may not have been run after configuration changes.

Solution:

  1. Check whether the metric exists in Prometheus by running the following command on the coordinator host. Replace <prometheus-host> with the hostname from your PROMETHEUS_ENDPOINT configuration:

    curl 'http://<prometheus-host>:9090/api/v1/query?query=warehousepg_observability_node_cpu_seconds_total'

    If the response contains "result":[], metrics are not reaching Prometheus.

  2. Inspect the Collector logs for errors:

    sudo cat /var/log/whpg-observability-collector/whpg-observability-collector.log
  3. Verify that PROMETHEUS_ENDPOINT in /var/lib/whpg-observability-collector/collector.conf is correct and that Prometheus is reachable from the coordinator.

  4. If you recently edited collector.conf, re-run the deployment script to apply the changes:

    cd /var/lib/whpg-observability-collector
    ./deploy-observability
  5. If Prometheus is running in Docker, confirm that the --web.enable-remote-write-receiver flag is set, or that the prometheus.yml includes remote_write receiver configuration. Without this flag, Prometheus silently ignores incoming metrics from the Collector.

Issue: Logs aren't loading

The Historical Logs (Coordinator & Segments) tab within the Logs panel reports the error: Loki integration is not configured.

Cause: The Loki log aggregation service is unavailable or the URL is incorrect.

Solution:

  1. Ensure the Loki service is active.
  2. Verify the Loki URL in the Settings tab within the Management panel.
  3. Check server-side logs for Connection Refused errors.

Could this page be better? Report a problem or suggest an addition!