Skip to main content
dbt

dbt

PROD
In this section, we provide guides and references to run the dbt workflow externally.

How to Run the Connector Externally

To run the Ingestion via the UI you’ll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. If, instead, you want to manage your workflows externally on your preferred orchestrator, you can check the following docs to run the Ingestion Framework anywhere.

Requirements

You must have access to dbt artifacts. At minimum, the manifest.json file is required. The catalog.json and run_results.json files are optional but recommended for richer metadata. For dbt Cloud, create a service token with the Account Viewer permission and collect the account, project, and job IDs if you want to target a specific run.

Python Requirements

We have support for Python versions 3.9-3.11
To run the dbt ingestion, install:
pip3 install "openmetadata-ingestion[dbt]"

dbt Ingestion

All connectors are defined as JSON Schemas. You can find the structure for dbt workflows in the OpenMetadata spec repository.

1. Define the YAML Config

Choose one of the following dbt artifact sources:

1. dbt Core: AWS S3 Buckets

In this configuration we fetch dbt artifacts from an S3 bucket.

2. dbt Core: Google Cloud Storage Buckets

In this configuration we fetch dbt artifacts from a GCS bucket.

3. dbt Core: Azure Storage Buckets

In this configuration we fetch dbt artifacts from Azure Storage.

4. dbt Core: Local Storage

In this configuration we fetch dbt artifacts from the machine running the ingestion.

5. dbt Core: File Server

In this configuration we fetch dbt artifacts from an HTTP or file server.

6. dbt Cloud: API-Based Ingestion

In this configuration we fetch dbt artifacts from dbt Cloud APIs.
The dbt Cloud workflow uses the dbt Cloud v2 APIs to retrieve the latest successful run and download artifacts.

2. Run with the CLI

First, we will need to save the YAML file. Afterward, and with all requirements installed, we can run:
metadata ingest -c <path-to-yaml>
Note that from connector to connector, this recipe will always be the same. By updating the YAML configuration, you will be able to extract metadata from different sources.