Skip to main content
dbt

dbt

PROD
Feature List
Metadata
Queries
Lineage
Tags
Tiers
Domains
Custom Properties
Glossary
Owners
Descriptions
Tests
Exposures

Configure dbt workflow

Learn how to configure the dbt workflow to ingest dbt data from your data sources.
Prerequisites for dbt Core: Before configuring the workflow, ensure you have set up artifact storage. dbt Core requires artifacts (manifest.json, catalog.json) to be accessible to Collate.See the Storage Configuration Overview for setup guides:This step is not required for dbt Cloud - artifacts are managed automatically via API.
  • Collate supports both dbt Core and dbt Cloud for databases. After metadata ingestion, Collate extracts model information from dbt and integrates it accordingly.
  • Additionally, dbt Cloud supports executing models directly. Collate enables ingestion of these executions as a Pipeline Service for enhanced tracking and visibility.

Configuration

Once the dbt metadata ingestion pipeline runs successfully and the service entities are available in Collate, dbt metadata is automatically ingested and associated with the corresponding data assets. As part of dbt ingestion, Collate can ingest and apply the following metadata from dbt:
  • dbt models and their relationships
  • Model and source lineage
  • dbt tests and test execution results
  • dbt tags
  • dbt owners
  • dbt descriptions
  • dbt tiers
  • dbt glossary terms
This ingestion enriches the Table Entity and populates the dbt tab on the Table Entity page, providing a consolidated view of dbt-related context for each table.
No additional manual configuration is required in the UI after a successful dbt ingestion run.
dbt We can create a workflow that will obtain the dbt information from the dbt files and feed it to Collate. The dbt Ingestion will be in charge of obtaining this data.

1. Add a dbt Ingestion

From the Service Page, go to the Ingestions tab to add a new ingestion and click on Add dbt Ingestion. add-ingestion

2. Configure the dbt Ingestion

Here you can enter the configuration required for Collate to get the dbt files (manifest.json, catalog.json and run_results.json) required to extract the dbt metadata. Select any one of the source from below from where the dbt files can be fetched:
Only the manifest.json file is required for dbt ingestion.

dbt Core

AWS S3 Buckets

Collate connects to the AWS s3 bucket via the credentials provided and scans the AWS s3 buckets for manifest.json, catalog.json and run_results.json files. The name of the s3 bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files. Follow the link here for instructions on setting up multiple dbt projects. aws-s3-bucket

Google Cloud Storage Buckets

Collate connects to the GCS bucket via the credentials provided and scans the gcp buckets for manifest.json, catalog.json and run_results.json files. The name of the GCS bucket and prefix path to the folder in which the dbt files are stored can be provided. In the case where these parameters are not provided all the buckets are scanned for the files. GCS credentials can be stored in two ways: 1. Entering the credentials directly into the form Follow the link here for instructions on setting up multiple dbt projects. gcp-storage-bucket-form 2. Entering the path of file in which the GCS bucket credentials are stored. gcp-storage-bucket-path For more information on Google Cloud Storage authentication click here.

Azure Storage Buckets

Collate connects to Azure Storage using the credentials provided and scans the configured storage containers for manifest.json, catalog.json and run_results.json files. The Azure Storage account, container name, and optional folder (prefix) path where the dbt files are stored can be provided. If these parameters are not provided, all accessible containers in the storage account are scanned for the files. Follow the link here for instructions on setting up multiple dbt projects. azure-bucket

Local Storage

Path of the manifest.json, catalog.json and run_results.json files stored in the local system or in the container in which Collate server is running can be directly provided. local-storage

File Server

File server path of the manifest.json, catalog.json and run_results.json files stored on a file server directly provided. file-server

dbt Cloud

Click on the the link here for getting started with dbt cloud account setup if not done already. The APIs need to be authenticated using an Authentication Token. Follow the link here to generate an authentication token for your dbt cloud account. The Account Viewer permission is the minimum requirement for the dbt cloud token.
The dbt Cloud workflow leverages the dbt Cloud v2 APIs to retrieve dbt run artifacts (manifest.json, catalog.json, and run_results.json) and ingest the dbt metadata.It uses the /runs API to obtain the most recent successful dbt run, filtering by account_id, project_id and job_id if specified. The artifacts from this run are then collected using the /artifacts API.Refer to the code here
dbt-cloud
The fields for Dbt Cloud Account Id, Dbt Cloud Project Id and Dbt Cloud Job Id should be numeric values.To know how to get the values for Dbt Cloud Account Id, Dbt Cloud Project Id and Dbt Cloud Job Id fields check here.

3. Schedule and Deploy

After clicking Next, you will be redirected to the Scheduling form. This will be the same as the Metadata Ingestion. Select your desired schedule and click on Deploy to find the lineage pipeline being added to the Service Ingestions. schedule-and-deploy