Apache Airflow app Innovation Release
- Hybrid Manager dual release strategy
- Documentation for the current Long-term support release
Important
Regular upstream releases deliver security updates. Keep your Airflow instance up to date to receive the latest patches.
Apache Airflow® is an open-source workflow orchestration platform deployed and managed within Hybrid Manager (HM). Its extensible Python framework supports integration with external technologies, and a web-based UI accessible from the HM console provides visibility into workflow execution and debugging. Airflow supports both single-node and distributed deployments.
Key features
- Web-based UI for monitoring, managing, and debugging workflow execution.
- Python-based workflow definitions using Directed Acyclic Graphs (DAGs).
- Built-in scheduler supporting cron-based and event-driven triggers.
- Extensible operator library for integrating with external systems and services.
- Git synchronization for deploying DAG files from a repository.
- Distributed task execution across multiple workers.
Workflows as code in Apache Airflow
Airflow workflows are defined entirely in Python:
- Pipelines are defined in code, supporting dynamic DAG generation and parameterization.
- Built-in operators can be extended for custom integrations.
- Jinja templating is supported for runtime variable substitution.
Supported data sources
Airflow connects to nearly any data source via Connections, including PostgreSQL, MySQL, AWS, and Snowflake. Airflow's Connection stores credentials and other information necessary for connecting to external services.
Requirements
Technical requirements
- Python: v3.10+.
- (Required) Database: Airflow requires a database to store workflow state, task instances, DAG definitions, and run history. You must create this database before deploying Airflow.
- PostgreSQL v13+ is recommended for production. Have the host, port, database name, username, and password ready before installation. You can obtain these from the Connect tab of your cluster in the HM console.
- Airflow also supports MySQL 8.0.17+ and MariaDB 10.2.2+.
- (Required) Security keys: Airflow uses Fernet to encrypt passwords in the connection configuration and the variable configuration (minimum 32 characters).
- Generate the Fernet key by running:
openssl rand -base64 32. - Generate the
apiSecretKeyby running:openssl rand -base64 32.
- Generate the Fernet key by running:
- (Optional) Storage: Git Sync configuration for syncing DAGs from public or private Git repositories using SSH keys. If using Git sync, prepare the repository URL, branch name, and SSH private key.
Deploying Airflow
- In the HM console, navigate to Asset Library, then select Apps.
- Select Apache Airflow and select Deploy.
- Under Identity, enter a name for your deployment and select the target project.
- Under Parameters, fill in the required fields. See Parameters for field descriptions.
- Select Deploy to install Airflow.
Parameters
| Parameter | Description | Required |
|---|---|---|
apiSecretKey | Secret key for API authentication. Generate with openssl rand -base64 32. | Yes |
fernetKey | Key for encrypting metadata database connections. Generate with openssl rand -base64 32. | Yes |
data.metadataConnection | External PostgreSQL connection details: database name, host, port, password, username, SSL mode (require recommended), and protocol (postgresql). | Yes |
web.defaultUser | Default admin username for the Airflow UI. | Yes |
web.password | Default admin user password for the Airflow UI. | Yes |
dags.gitSync.enabled | Toggle to enable syncing DAGs from a Git repository. | No |
dags.gitSync.repo | Git repository URL containing your DAGs. | No |
dags.gitSync.branch | Branch to sync from. Default: main. | No |
dags.gitSync.subPath | Subfolder in the repository containing DAG files. Default: dags. | No |
dags.gitSync.sshKey | SSH private key for authenticating to a private Git repository. | No |
dags.gitSync.wait | Seconds between Git sync intervals. Default: 60. | No |
dagProcessor.replicas | Number of DAG processor replicas. Airflow 3.0+ only. Min: 1, max: 3. | No |
scheduler.replicas | Number of scheduler replicas. Min: 1, max: 3. | No |
Note
Once you have deployed Apache Airflow in HM, you can access it in the HM console under Apps or Asset Library.
Post-deployment configuration
Once Airflow is deployed, complete these steps to finish your setup:
- Navigate to your Project and select the Apps tab, or navigate to Estate > Apps to check the deployment status. When the status shows Ready, launch the app to continue.
- Open Airflow from the HM console.
- Log in with the admin username you set in
web.defaultUserduring deployment. - Navigate to Admin > Connections to add credentials for any databases or services your DAGs will connect to.
- If you enabled Git sync, verify that your DAGs are appearing in the Airflow UI.
Support resources
As Airflow OSS is community-supported, use these resources for technical guidance and updates:
- Airflow community — peer-to-peer support from the community.
- Airflow documentation — self-service deployment and feature guides, as well as additional information on Airflow installation and configuration.
- Airflow GitHub repository — source code, issue tracker, and release notes.
- Best Practices — guidance for developing and maintaining efficient DAGs.
- Scheduler — guidance for fine-tuning scheduler performance.
- Tutorials — step-by-step tutorials for getting started with Airflow.
- How-to Guides — practical guides for common Airflow tasks.