Skip to main content

Airflow REST API Connection

The REST API connection communicates with the Airflow web server over HTTP/HTTPS. It does not require direct access to Airflow’s underlying metadata database, making it the right choice for managed Airflow deployments (Astronomer, Cloud Composer, MWAA) or any setup where direct database access is unavailable or undesirable.
What the REST API connection captures
  • DAG topology and task structure
  • Pipeline schedules and run statuses
  • DAG owners and tags
  • Pipeline status history (configurable look-back window)
Lineage is not captured through the REST API alone. Table-level lineage (table → DAG → table edges) requires the Apache Airflow OpenLineage provider (apache-airflow-providers-openlineage) to push OpenLineage events to Collate’s endpoint:
POST /api/v1/openlineage/lineage
Two configuration values are required:
  • Namespace (AIRFLOW__OPENLINEAGE__NAMESPACE) — identifies this Airflow instance in Collate. Should match the pipeline service name.
  • Transport (AIRFLOW__OPENLINEAGE__TRANSPORT) — JSON pointing the provider at your Collate server using the HTTP transport type, along with a bot JWT for authentication.
Deployment-specific configuration is shown in each auth section below and summarized in the OpenLineage Setup Summary.

Supported Deployments

DeploymentAuth Method
Self-hosted AirflowBasic Auth
AstronomerAccess Token
Google Cloud ComposerGCP Service Account
Amazon MWAAMWAA Configuration

Common Parameters

These parameters apply regardless of which authentication method you select.
ParameterRequiredDefaultDescription
hostPort✅ YesBase URL of the Airflow web UI. Format: scheme://hostname:port. Do not include a trailing slash.
connection.type✅ YesRestAPIFixed value — auto-set when you select the REST API option in the UI.
connection.authConfig✅ YesAuthentication method. See sections below.
connection.apiVersionNoautoAPI version. Leave as auto to detect at runtime, or set explicitly to v2.
connection.verifySSLNotrueVerify the Airflow server’s SSL certificate. Set to false only in dev environments with self-signed certs.
numberOfStatusNo10Number of past pipeline run statuses to read per ingestion run.
pipelineFilterPatternNoInclude / exclude DAGs by name using regular expressions.

Host and Port — Format by Deployment

DeploymentExample hostPort
Self-hosted / Docker (ingestion on host)http://localhost:8080
Self-hosted / Docker (ingestion inside Docker)http://host.docker.internal:8080
Google Cloud Composerhttps://<hash>-dot-<region>.composer.googleusercontent.com
Astronomerhttps://<deployment-id>.ay.astronomer.run/<workspace>/
Amazon MWAAhttps://<id>.c2.airflow.<region>.on.aws

Authentication Methods

1. Basic Auth

Best for: Self-hosted Airflow. Basic Auth uses a username and password to authenticate against the Airflow web server. Collate automatically exchanges the credentials for a short-lived JWT via POST /auth/token, which is then sent as Authorization: Bearer <token> on all subsequent requests.

Required Parameters

ParameterRequiredDescription
username✅ YesUsername for an Airflow user with REST API access.
password✅ YesPassword for the above user. Stored encrypted.

Required Airflow Permissions

Create a dedicated Airflow user with the Viewer role. The user needs read access to DAGs, DAG runs, task instances, task logs, event logs, and configuration. No write permissions are required.

UI Setup

Collate UI — Airflow REST API connection with Basic Auth selected, showing Username, Password, and API Version fields

OpenLineage Setup — Basic Auth

Configure the namespace and HTTP transport in airflow.cfg:
[openlineage]
namespace = my-airflow-instance
transport = {"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}
Or via environment variables:
AIRFLOW__OPENLINEAGE__NAMESPACE=my-airflow-instance
AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}'
Restart the Airflow web server and scheduler after making changes. OpenLineage events are emitted automatically on task completion for SQL-native operators (PostgreSQL, MySQL, Snowflake, BigQuery, etc.). For Python operators, emit events explicitly using OpenLineageClient in the task body.
Set namespace to the Airflow pipeline service name registered in Collate. Collate uses this value to associate lineage events with the correct pipeline service.

2. Access Token

Best for: Astronomer, or any Airflow deployment with a pre-generated bearer token. Access Token auth sends a static bearer token on every request as Authorization: Bearer <token>. Use this when you have generated a long-lived deployment API token in Astronomer, or when your Airflow instance exposes token-based authentication.

Required Parameters

ParameterRequiredDescription
token✅ YesThe bearer token value. Stored encrypted. Sent as Authorization: Bearer <token> on every API call.

UI Setup

Collate UI — Airflow REST API connection with Access Token selected, showing Token, API Version, and Verify SSL fields

Required Permissions

Astronomer: The Deployment API token must have at least Viewer access to the deployment. In the Astronomer UI, when creating a token, assign it the Viewer deployment role. This grants read access to DAGs, runs, and task instances via the Airflow REST API.

Generating an Astronomer Deployment API Token

1

Navigate to Deployments

Open the Astronomer UI and navigate to Deployments.
2

Select your deployment

Select your target deployment.
3

Open API Keys / Tokens

Go to API Keys or Tokens (label varies by Astronomer version).
4

Generate Token

Click Add API Key / Generate Token, give it a name such as collate-ingestion, assign it the Viewer role, and copy the value.
5

Paste into Collate

Paste it into the Token field in Collate.
Astronomer deployment API tokens are scoped to a single deployment. If you ingest from multiple Astronomer deployments, create one token per deployment and one Collate Airflow service per deployment.

Generating a Token for Self-Hosted Airflow

Exchange credentials for a JWT via the Airflow REST API:
curl -X POST http://localhost:8080/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "collate", "password": "<password>"}'
# Returns: {"access_token": "<JWT>"}

OpenLineage Setup — Astronomer (Access Token)

apache-airflow-providers-openlineage is included in Astronomer’s base Airflow image — no extra package installation is needed. Set the namespace and transport via Astronomer environment variables (Deployments → Environment Variables). Mark the transport variable as Secret since it contains your Collate JWT:
AIRFLOW__OPENLINEAGE__NAMESPACE = my-astronomer-deployment
AIRFLOW__OPENLINEAGE__TRANSPORT = {"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}
The api_key here is a Collate server bot token — not the same Astronomer Deployment API token used for metadata extraction. The ingestion token controls what Collate reads from Airflow; the lineage JWT controls what Airflow pushes to Collate.
Astronomer environment variables UI showing OpenLineage namespace and transport configuration

3. GCP Service Account (Google Cloud Composer)

Best for: Google Cloud Composer environments. This method uses a GCP service account to obtain short-lived OAuth2 tokens for authenticating with the Cloud Composer Airflow web server. Tokens are automatically refreshed at runtime via google-auth, so ingestion runs are never interrupted by token expiry.

Required Parameters

ParameterRequiredDescription
credentials✅ YesGCP credentials object. Choose one of four sub-types below.

Credential Sub-Types

TypeWhen to Use
GCP Credentials ValuesIngestion runs outside GCP (on-prem, local machine). Paste service account JSON fields directly.
GCP Credentials PathIngestion host already has the service account JSON key file at a known local path.
GCP ADC (Application Default Credentials)Ingestion runs on a GCE VM or GKE pod with an attached service account, or gcloud auth application-default login has been run.
GCP External Account (Workload Identity)Ingestion runs on GKE with Workload Identity, or on a non-GCP system using federated identity.

UI Setup

The screenshot below shows GCP Credentials Values selected. The same form is used for all four credential sub-types — switching the GCP Credentials Configuration dropdown reveals the relevant fields for each type.
Collate UI — Airflow REST API connection with GCP Service Account selected, showing GCP Credentials Configuration dropdown set to GCP Credentials Values

Finding Your Cloud Composer Airflow URL

In GCP Console: Composer → Environments → select your environment → click Open Airflow UI. Copy the base URL:
https://ko82752sdo9f7zjf811c682mw1e5uuc9-dot-us-east1.composer.googleusercontent.com
Do not include any trailing path segment.

Required Permissions

The service account must have read access to DAGs, DAG runs, task instances, and task logs in the Composer environment.

OpenLineage Setup — GCP / Cloud Composer

apache-airflow-providers-openlineage ships with Cloud Composer — no additional PyPI packages are needed. In GCP Console, go to Composer → Environments → Edit → Airflow configuration overrides and add the following entries:
SectionKeyValue
openlineagetransport{"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}
openlineagenamespace<your-pipeline-service-name>
lineagebackenddataplex
The lineage.backend = dataplex entry routes Airflow’s native lineage through GCP Data Lineage (Dataplex), while the openlineage.* entries send OpenLineage events to Collate. Both can be active simultaneously.
Cloud Composer environment updates take several minutes to complete. Once applied, DAG task completions automatically push OpenLineage events to Collate.
GCP Console — Composer Airflow configuration overrides panel showing openlineage namespace and transport settings

4. MWAA Configuration (Amazon Managed Workflows for Apache Airflow)

Best for: Amazon MWAA environments. MWAA does not expose the Airflow web server with simple username/password authentication. Instead, AWS generates a short-lived web login token via the MWAA control plane API. Collate uses your AWS credentials to call mwaa:CreateWebLoginToken, then uses that token to call the Airflow REST API.

Required Parameters

ParameterRequiredDescription
mwaaConfig.mwaaEnvironmentName✅ YesThe exact name of your MWAA environment as shown in the AWS Console.
mwaaConfig.awsConfig.awsRegion✅ YesAWS region where the MWAA environment is deployed (e.g., us-east-1).
mwaaConfig.awsConfig.awsAccessKeyIdConditionalAWS Access Key ID. Not required when using IAM roles or instance profiles.
mwaaConfig.awsConfig.awsSecretAccessKeyConditionalAWS Secret Access Key. Not required when using IAM roles or instance profiles.
mwaaConfig.awsConfig.awsSessionTokenNoRequired when using temporary (STS) credentials.
mwaaConfig.awsConfig.assumeRoleArnNoARN of an IAM role to assume before calling the MWAA API. Useful for cross-account access.
mwaaConfig.awsConfig.assumeRoleSessionNameNoSession name for the assumed role. Defaults to CollateSession.
mwaaConfig.awsConfig.endPointURLNoCustom endpoint URL for AWS-compatible services (PrivateLink, LocalStack).

UI Setup

Collate UI — Airflow REST API connection with MWAA Authentication selected, showing MWAA Environment Name and AWS Configuration fields

Finding Your MWAA Airflow URL

In AWS Console: Amazon MWAA → Environments → select your environment → copy the Airflow UI URL shown in the environment details panel. Use only the base URL — do not include any trailing path.

Required Permissions

The IAM user or role must have access to DAGs, DAG runs, task instances, and task logs in the MWAA environment.

API Version

The apiVersion field controls which Airflow REST API version Collate targets:
ValueBehaviour
auto (default)Collate auto-detects the API version at runtime. Recommended for all new connections.
v2Always use the Airflow API path (/api/v2/...).
Use auto for new connections. Pin to v2 only if the auto-detection probe causes issues in your environment (e.g., strict WAF rules that reject the probe request).

SSL Verification

The verifySSL flag (default true) controls whether Collate validates the Airflow server’s TLS certificate chain.
  • Set to true for all production environments.
  • Set to false only in local development when using self-signed certificates. Never disable SSL verification in production.

Pipeline Filter Pattern

Patterns are evaluated as Python regular expressions against DAG IDs. If both includes and excludes match a DAG ID, the DAG is included.

OpenLineage Setup Summary

Lineage configuration is independent of the REST API auth method. The OpenLineage provider sends events to Collate using a separate HTTP transport — the configuration is the same regardless of how Collate authenticates to Airflow for metadata extraction. Collate OpenLineage endpoint: POST /api/v1/openlineage/lineage
# airflow.cfg
[openlineage]
namespace = <your-airflow-instance-name>
transport = {"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}
Or as environment variables:
AIRFLOW__OPENLINEAGE__NAMESPACE=<your-airflow-instance-name>
AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "http", "url": "https://your-collate-host/api/v1/openlineage/", "endpoint": "lineage", "auth": {"type": "api_key", "api_key": "<your-Collate-bot-JWT>"}}'
StepAction
1. Confirm providerapache-airflow-providers-openlineage ships with Airflow, Cloud Composer, and Astronomer. For self-hosted, install via pip install apache-airflow-providers-openlineage.
2. Set namespaceAIRFLOW__OPENLINEAGE__NAMESPACE — identifies this Airflow instance in Collate. Should match the pipeline service name.
3. Set transportAIRFLOW__OPENLINEAGE__TRANSPORT — JSON with HTTP type, Collate base URL including /api/v1/openlineage/, endpoint lineage, and bot JWT as api_key.
4. Restart AirflowWeb server and scheduler must restart to pick up the new configuration.
Once configured, every DAG task completion automatically emits an OpenLineage event to Collate, populating lineage edges between pipeline tasks and the data assets they read from and write to.
OpenLineage auto-instruments SQL-native operators (PostgreSQL, MySQL, Snowflake, BigQuery, etc.). For Python @task operators, emit events explicitly using OpenLineageClient.from_environment() in the task body with the input and output datasets.