Apache Airflow app Innovation Release

Important

Regular upstream releases deliver security updates. Keep your Airflow instance up to date to receive the latest patches.

Apache Airflow® is an open-source workflow orchestration platform deployed and managed within Hybrid Manager (HM). Its extensible Python framework supports integration with external technologies, and a web-based UI accessible from the HM console provides visibility into workflow execution and debugging. Airflow supports both single-node and distributed deployments.

Key features

  • Web-based UI for monitoring, managing, and debugging workflow execution.
  • Python-based workflow definitions using Directed Acyclic Graphs (DAGs).
  • Built-in scheduler supporting cron-based and event-driven triggers.
  • Extensible operator library for integrating with external systems and services.
  • Git synchronization for deploying DAG files from a repository.
  • Distributed task execution across multiple workers.

Workflows as code in Apache Airflow

Airflow workflows are defined entirely in Python:

  • Pipelines are defined in code, supporting dynamic DAG generation and parameterization.
  • Built-in operators can be extended for custom integrations.
  • Jinja templating is supported for runtime variable substitution.

Supported data sources

Airflow connects to nearly any data source via Connections, including PostgreSQL, MySQL, AWS, and Snowflake. Airflow's Connection stores credentials and other information necessary for connecting to external services.

Requirements

Technical requirements

  • Python: v3.10+.
  • (Required) Database: Airflow requires a database to store workflow state, task instances, DAG definitions, and run history. You must create this database before deploying Airflow.
    • PostgreSQL v13+ is recommended for production. Have the host, port, database name, username, and password ready before installation. You can obtain these from the Connect tab of your cluster in the HM console.
    • Airflow also supports MySQL 8.0.17+ and MariaDB 10.2.2+.
  • (Required) Security keys: Airflow uses Fernet to encrypt passwords in the connection configuration and the variable configuration (minimum 32 characters).
    • Generate the Fernet key by running: openssl rand -base64 32.
    • Generate the apiSecretKey by running: openssl rand -base64 32.
  • (Optional) Storage: Git Sync configuration for syncing DAGs from public or private Git repositories using SSH keys. If using Git sync, prepare the repository URL, branch name, and SSH private key.

Deploying Airflow

  1. In the HM console, navigate to Asset Library, then select Apps.
  2. Select Apache Airflow and select Deploy.
  3. Under Identity, enter a name for your deployment and select the target project.
  4. Under Parameters, fill in the required fields. See Parameters for field descriptions.
  5. Select Deploy to install Airflow.

Parameters

ParameterDescriptionRequired
apiSecretKeySecret key for API authentication. Generate with openssl rand -base64 32.Yes
fernetKeyKey for encrypting metadata database connections. Generate with openssl rand -base64 32.Yes
data.metadataConnectionExternal PostgreSQL connection details: database name, host, port, password, username, SSL mode (require recommended), and protocol (postgresql).Yes
web.defaultUserDefault admin username for the Airflow UI.Yes
web.passwordDefault admin user password for the Airflow UI.Yes
dags.gitSync.enabledToggle to enable syncing DAGs from a Git repository.No
dags.gitSync.repoGit repository URL containing your DAGs.No
dags.gitSync.branchBranch to sync from. Default: main.No
dags.gitSync.subPathSubfolder in the repository containing DAG files. Default: dags.No
dags.gitSync.sshKeySSH private key for authenticating to a private Git repository.No
dags.gitSync.waitSeconds between Git sync intervals. Default: 60.No
dagProcessor.replicasNumber of DAG processor replicas. Airflow 3.0+ only. Min: 1, max: 3.No
scheduler.replicasNumber of scheduler replicas. Min: 1, max: 3.No
Note

Once you have deployed Apache Airflow in HM, you can access it in the HM console under Apps or Asset Library.

Post-deployment configuration

Once Airflow is deployed, complete these steps to finish your setup:

  1. Navigate to your Project and select the Apps tab, or navigate to Estate > Apps to check the deployment status. When the status shows Ready, launch the app to continue.
  2. Open Airflow from the HM console.
  3. Log in with the admin username you set in web.defaultUser during deployment.
  4. Navigate to Admin > Connections to add credentials for any databases or services your DAGs will connect to.
  5. If you enabled Git sync, verify that your DAGs are appearing in the Airflow UI.

Support resources

As Airflow OSS is community-supported, use these resources for technical guidance and updates:

  • Airflow community — peer-to-peer support from the community.
  • Airflow documentation — self-service deployment and feature guides, as well as additional information on Airflow installation and configuration.
  • Airflow GitHub repository — source code, issue tracker, and release notes.
  • Best Practices — guidance for developing and maintaining efficient DAGs.
  • Scheduler — guidance for fine-tuning scheduler performance.
  • Tutorials — step-by-step tutorials for getting started with Airflow.
  • How-to Guides — practical guides for common Airflow tasks.