Skip to main content

dbt Workflow

dbt Integration

FeatureStatus
StagePROD
dbt Queries
dbt Lineage
dbt Tags
dbt Tiers
dbt Domain
dbt Custom Properties
dbt Glossary
dbt Owner
dbt Descriptions
dbt Tests
dbt Exposures
dbt push metadata
Supported dbt Core Versionsv1.2 v1.3 v1.4 v1.5 v1.6 v1.7 v1.8 v1.9

Requirements

Collate supports ingestion from both dbt Core and dbt Cloud.
The requirements vary depending on how dbt is deployed and executed.

Why We Need dbt Artifacts

To bring your dbt project into Collate, we need to read the metadata that dbt generates about your transformations. dbt automatically creates JSON files (called “artifacts”) whenever you run commands like dbt run, dbt test, or dbt docs generate. These artifacts allow Collate to:
  • Build lineage graphs — See how your models connect to sources and each other
  • Sync documentation — Keep table and column descriptions in sync with your dbt project
  • Track data quality — Monitor test results and show pass/fail status
  • Import metadata — Bring over tags, ownership, domains, and custom properties
We read these pre-generated files rather than parsing your SQL and YAML directly, which means the integration works with any dbt setup—whether you run dbt in Airflow, Kubernetes, GitHub Actions, or locally.

Understanding dbt Artifact Files

dbt generates JSON files in the target/ directory. Here’s what each file provides:

manifest.json (Required)

The manifest is the heart of dbt metadata. This file is required for the integration to work. What it contains:
  • Model definitions and SQL code
  • ref() and source() dependencies for lineage
  • Descriptions from schema.yml files
  • dbt tags and meta properties
  • Test configurations
  • Column definitions
How Collate uses it:
  • Creates Data Model entities linked to your tables
  • Builds lineage graphs showing data flow between models
  • Syncs table and column descriptions
  • Creates classification tags
  • Assigns ownership and domains
  • Creates test cases for data quality monitoring
Generated by: Any dbt command (dbt run, dbt build, dbt compile) The catalog provides database-level details that the manifest doesn’t have. What it contains:
  • Actual column data types from the database
  • Database-level ownership information
  • Column ordering as it exists in the database
  • Statistics about tables and columns
How Collate uses it:
  • Provides accurate column data types (more reliable than schema.yml declarations)
  • Fallback owner information if not specified in meta properties
  • Maintains column position from your database
Generated by: dbt docs generate
Without catalog.json, you’ll still get lineage and model information, but column types will only include what’s declared in your schema.yml files.
Run results capture the outcome of your most recent dbt execution. What it contains:
  • Test pass/fail/warn status for each test
  • Execution timestamps
  • Error messages and stack traces
  • Model build success/failure status
How Collate uses it:
  • Updates test case results showing pass/fail status
  • Tracks when tests last ran
  • Shows failure details for debugging
Generated by: dbt run, dbt test, dbt build
Run dbt test before the Collate ingestion runs to capture the latest test results.

Generating Your dbt Artifacts

Run these commands after your dbt models execute:
# Step 1: Run your models (generates manifest.json)
dbt run

# Step 2: Run your tests (updates run_results.json with test outcomes)
dbt test

# Step 3: Generate the catalog (creates catalog.json)
dbt docs generate
Verify your artifacts exist:
ls -la target/*.json

# Expected files:
# target/manifest.json    (required)
# target/catalog.json     (recommended)
# target/run_results.json (recommended)

dbt Core Artifact Storage

Artifact Accessibility

Since dbt Core runs within your infrastructure (for example, using Airflow or similar schedulers), Collate does not have direct access to the local file system where dbt executes. To enable ingestion, the dbt artifacts must be made accessible to Collate by storing them in a supported cloud storage service. Collate currently supports the following storage systems:
  • Amazon S3
  • Google Cloud Storage (GCS)
  • Azure Data Lake / Azure Blob Storage
  • HTTP/HTTPS servers
  • Local or shared filesystem

Configuration Steps

To configure dbt Core artifact ingestion:
  1. Generate artifacts: Ensure dbt generates required files (manifest.json, catalog.json, run_results.json)
  2. Choose storage method: Select from S3, GCS, Azure, HTTP, or Local (see options below)
  3. Upload artifacts: Configure your workflow to upload artifacts to chosen storage
  4. Configure Collate: Provide storage path and credentials during ingestion setup
See the Storage Configuration Overview for complete implementation guides.

dbt Core Artifact Configuration

When using dbt Core, artifacts must be accessible to Collate. Choose your storage method:

dbt Cloud Requirements

When using dbt Cloud, Collate integrates directly with dbt Cloud using APIs to retrieve metadata and execution details. See the dbt Cloud API Configuration Guide for complete setup instructions.

Prerequisites

To configure dbt Cloud ingestion, you must have:
  • An active dbt Cloud account
  • At least one dbt Cloud job configured to generate dbt artifacts
  • A valid dbt Cloud API token with sufficient permissions

Supported Metadata

Using dbt Cloud integration, Collate can ingest:
  • dbt models and sources
  • Column-level metadata
  • Model and source lineage
  • dbt test results (when tests are executed as part of the job)
  • No external cloud storage configuration is required for dbt Cloud ingestion.
  • Ensure that your dbt Cloud job is configured to generate documentation artifacts.

OpenMetadata integrates the below metadata from dbt

1. dbt Queries

Queries used to create the dbt models can be viewed in the dbt tab dbt-query

2. dbt Lineage

Lineage from dbt models can be viewed in the Lineage tab. For more information on how lineage is extracted from dbt take a look here dbt-lineage
To capture lineage, the compiled_code field must be present in the manifest.json file.
  • If compiled_code is missing, lineage will not be captured for that node.
  • To ensure compiled_code is populated in your dbt manifest, run the following commands in your dbt project:
    • dbt compile
    • dbt docs generate

3. dbt Tags

Table and column level tags can be imported from dbt Please refer here for adding dbt tags dbt-tags

4. dbt Owner

Owner from dbt models can be imported and assigned to respective tables Please refer here for adding dbt owner dbt-owner

5. dbt Descriptions

Descriptions from dbt manifest.json and catalog.json can be imported and assigned to respective tables and columns. For more information and to control how the table and column descriptions are updated from dbt please take a look here dbt-descriptions

6. dbt Tests and Test Results

Tests from dbt will only be imported if the run_results.json file is passed. dbt-tests

7. dbt Tiers

Table and column level Tiers can be imported from dbt Please refer here for adding dbt tiers dbt-tiers

8. dbt Glossary

Table and column level Glossary can be imported from dbt Please refer here for adding dbt glossary dbt-glossary

9. dbt Domain

Table level Domain can be imported from dbt to assign tables to organizational domains Please refer here for adding dbt domain

10. dbt Custom Properties

Custom property values can be imported from dbt to enrich table metadata with organization-specific attributes Please refer here for adding dbt custom properties

Troubleshooting

For any issues please refer to the troubleshooting documentation here