Run dbt Workflow Externally | Collate Guide

In this section, we provide guides and references to run the dbt workflow externally.

Requirements
Python Requirements
dbt Ingestion
Run with the CLI

How to Run the Connector Externally

To run the Ingestion via the UI you’ll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. If, instead, you want to manage your workflows externally on your preferred orchestrator, you can check the following docs to run the Ingestion Framework anywhere.

External Schedulers

Get more information about running the Ingestion Framework Externally

Requirements

You must have access to dbt artifacts. At minimum, the manifest.json file is required. The catalog.json and run_results.json files are optional but recommended for richer metadata. For dbt Cloud, create a service token with the Account Viewer permission and collect the account, project, and job IDs if you want to target a specific run.

Python Requirements

We have support for Python versions 3.9-3.11

To run the dbt ingestion, install:

pip3 install "openmetadata-ingestion[dbt]"

dbt Ingestion

All connectors are defined as JSON Schemas. You can find the structure for dbt workflows in the OpenMetadata spec repository.

1. Define the YAML Config

Choose one of the following dbt artifact sources:

AWS S3 Buckets
Google Cloud Storage Buckets
Azure Storage Buckets
Local Storage
File Server
dbt Cloud

1. dbt Core: AWS S3 Buckets

In this configuration we fetch dbt artifacts from an S3 bucket.

2. dbt Core: Google Cloud Storage Buckets

In this configuration we fetch dbt artifacts from a GCS bucket.

3. dbt Core: Azure Storage Buckets

In this configuration we fetch dbt artifacts from Azure Storage.

4. dbt Core: Local Storage

In this configuration we fetch dbt artifacts from the machine running the ingestion.

5. dbt Core: File Server

In this configuration we fetch dbt artifacts from an HTTP or file server.

6. dbt Cloud: API-Based Ingestion

In this configuration we fetch dbt artifacts from dbt Cloud APIs.

The dbt Cloud workflow uses the dbt Cloud v2 APIs to retrieve the latest successful run and download artifacts.

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:

metadata ingest -c <path-to-yaml>

Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.

Connectors

Connectors

Run dbt Workflow Externally | Collate Guide

How to Run the Connector Externally

External Schedulers

Requirements

Python Requirements

dbt Ingestion

1. Define the YAML Config

1. dbt Core: AWS S3 Buckets

2. dbt Core: Google Cloud Storage Buckets

3. dbt Core: Azure Storage Buckets

4. dbt Core: Local Storage

5. dbt Core: File Server

6. dbt Cloud: API-Based Ingestion

2. Run with the CLI

Connectors

Connectors

​How to Run the Connector Externally

External Schedulers

​Requirements

​Python Requirements

​dbt Ingestion

​1. Define the YAML Config

​1. dbt Core: AWS S3 Buckets

​2. dbt Core: Google Cloud Storage Buckets

​3. dbt Core: Azure Storage Buckets

​4. dbt Core: Local Storage

​5. dbt Core: File Server

​6. dbt Cloud: API-Based Ingestion

​2. Run with the CLI

How to Run the Connector Externally

Requirements

Python Requirements

dbt Ingestion

1. Define the YAML Config

1. dbt Core: AWS S3 Buckets

2. dbt Core: Google Cloud Storage Buckets

3. dbt Core: Azure Storage Buckets

4. dbt Core: Local Storage

5. dbt Core: File Server

6. dbt Cloud: API-Based Ingestion

2. Run with the CLI