Skip to main content

dbt Artifact Storage: Azure Blob Storage Configuration

This guide walks you through configuring Azure Blob Storage as the artifact storage layer for dbt Core + Collate integration. Perfect for Microsoft Azure deployments.

Prerequisites Checklist

RequirementDetailsHow to Verify
Azure AccountWith permissions to create Storage Accountsaz account show
Azure CLIInstalled and configuredaz --version
dbt ProjectExisting dbt projectdbt debug
OrchestrationAirflow or ADFAccess to pipeline configuration
Database ServiceData warehouse already ingestedCheck Settings → Services

Step 1: Azure Blob Storage Setup

1.1 Create Storage Account and Container

# Set your variables
export RESOURCE_GROUP="dbt-metadata-rg"
export LOCATION="eastus"
export STORAGE_ACCOUNT="dbtartifacts${RANDOM}"  # Must be globally unique
export CONTAINER_NAME="dbt-artifacts"

# Login to Azure
az login

# Create resource group
az group create \
    --name ${RESOURCE_GROUP} \
    --location ${LOCATION}

# Create storage account
az storage account create \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --location ${LOCATION} \
    --sku Standard_LRS \
    --kind StorageV2

# Verify creation
az storage account show \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --query "name" -o tsv
Expected output:
dbtartifacts12345

1.2 Create Blob Container

# Get storage account key
export STORAGE_KEY=$(az storage account keys list \
    --resource-group ${RESOURCE_GROUP} \
    --account-name ${STORAGE_ACCOUNT} \
    --query '[0].value' -o tsv)

# Create container
az storage container create \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}

# Verify container
az storage container show \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}

1.3 Configure Access (Choose One Option)

Option A: Using Storage Account Key (Simplest)
# Save the storage key (provides full access)
echo "Storage Account: ${STORAGE_ACCOUNT}"
echo "Storage Key: ${STORAGE_KEY}"

# Or get connection string
az storage account show-connection-string \
    --name ${STORAGE_ACCOUNT} \
    --resource-group ${RESOURCE_GROUP} \
    --query connectionString -o tsv
Option B: Using SAS Token (Read-only for Collate)
# Create SAS token with read permissions (valid for 1 year)
export END_DATE=$(date -u -d "1 year" '+%Y-%m-%dT%H:%MZ')

az storage container generate-sas \
    --name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY} \
    --permissions rl \
    --expiry ${END_DATE} \
    --https-only \
    -o tsv
Option C: Using Managed Identity (Recommended for AKS)
# Enable managed identity on AKS
az aks update \
    --resource-group ${RESOURCE_GROUP} \
    --name your-aks-cluster \
    --enable-managed-identity

# Get the managed identity
export PRINCIPAL_ID=$(az aks show \
    --resource-group ${RESOURCE_GROUP} \
    --name your-aks-cluster \
    --query identityProfile.kubeletidentity.clientId -o tsv)

# Assign Storage Blob Data Contributor role (write access for dbt)
az role assignment create \
    --role "Storage Blob Data Contributor" \
    --assignee ${PRINCIPAL_ID} \
    --scope "/subscriptions/$(az account show --query id -o tsv)/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.Storage/storageAccounts/${STORAGE_ACCOUNT}"

1.4 Verify Blob Storage Access

# Create test file
echo "test" > /tmp/test.txt

# Upload
az storage blob upload \
    --container-name ${CONTAINER_NAME} \
    --name test.txt \
    --file /tmp/test.txt \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}

# List blobs
az storage blob list \
    --container-name ${CONTAINER_NAME} \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY} \
    --output table

# Clean up
az storage blob delete \
    --container-name ${CONTAINER_NAME} \
    --name test.txt \
    --account-name ${STORAGE_ACCOUNT} \
    --account-key ${STORAGE_KEY}
rm /tmp/test.txt

Step 2: Upload Artifacts from dbt

2.1 Understanding dbt Artifacts

Collate requires these dbt-generated files:
FileGenerated ByRequired?What It Contains
manifest.jsondbt run, dbt compile, dbt buildYESModels, sources, lineage, descriptions, tests
catalog.jsondbt docs generateRecommendedColumn names, types, descriptions
run_results.jsondbt run, dbt test, dbt buildOptionalTest pass/fail results, timing
Generate all artifacts:
dbt run           # Generates manifest.json
dbt test          # Updates run_results.json
dbt docs generate # Generates catalog.json

2.2 Complete Airflow DAG

This is a complete, working DAG for Azure deployments. Save as dbt_with_azure.py in your Airflow DAGs folder:
"""
dbt + Collate Integration DAG (Azure Blob Method)

This DAG:
1. Runs dbt models
2. Runs dbt tests
3. Generates dbt documentation (catalog.json)
4. Uploads all artifacts to Azure Blob Storage

Perfect for AKS, Azure VMs, or Container Instances.
"""

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup


# =============================================================================
# CONFIGURATION
# =============================================================================

# dbt Configuration
DBT_PROJECT_DIR = os.getenv("DBT_PROJECT_DIR", "/opt/airflow/dbt/my_project")
DBT_PROFILES_DIR = os.getenv("DBT_PROFILES_DIR", "/opt/airflow/dbt")

# Azure Blob Storage Configuration
AZURE_STORAGE_ACCOUNT = os.getenv("AZURE_STORAGE_ACCOUNT", "dbtartifacts12345")
AZURE_CONTAINER_NAME = os.getenv("AZURE_CONTAINER_NAME", "dbt-artifacts")
AZURE_STORAGE_KEY = os.getenv("AZURE_STORAGE_KEY", "")
AZURE_CONNECTION_STRING = os.getenv("AZURE_STORAGE_CONNECTION_STRING", "")

# =============================================================================
# DAG DEFAULT ARGUMENTS
# =============================================================================

default_args = {
    "owner": "data-engineering",
    "depends_on_past": False,
    "email": ["data-team@yourcompany.com"],
    "email_on_failure": True,
    "email_on_retry": False,
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(hours=2),
}

# =============================================================================
# PYTHON FUNCTIONS
# =============================================================================

def upload_artifacts_to_azure(**context):
    """
    Upload dbt artifacts to Azure Blob Storage.

    Uses azure-storage-blob library.
    Install with: pip install azure-storage-blob
    """
    from azure.storage.blob import BlobServiceClient

    target_dir = os.path.join(DBT_PROJECT_DIR, "target")

    # Initialize Azure Blob Service Client
    if AZURE_CONNECTION_STRING:
        blob_service_client = BlobServiceClient.from_connection_string(
            AZURE_CONNECTION_STRING
        )
    else:
        account_url = f"https://{AZURE_STORAGE_ACCOUNT}.blob.core.windows.net"
        blob_service_client = BlobServiceClient(
            account_url=account_url,
            credential=AZURE_STORAGE_KEY
        )

    container_client = blob_service_client.get_container_client(AZURE_CONTAINER_NAME)

    # Files to upload
    artifacts = [
        ("manifest.json", True),      # Required
        ("catalog.json", False),      # Optional but recommended
        ("run_results.json", False),  # Optional
        ("sources.json", False),      # Optional
    ]

    uploaded = []
    failed = []

    for filename, required in artifacts:
        local_path = os.path.join(target_dir, filename)

        if os.path.exists(local_path):
            try:
                blob_client = container_client.get_blob_client(filename)
                with open(local_path, "rb") as data:
                    blob_client.upload_blob(data, overwrite=True)

                uploaded.append(filename)
                print(f"✓ Uploaded {filename} to Azure Blob Storage")
            except Exception as e:
                error_msg = f"✗ Failed to upload {filename}: {e}"
                print(error_msg)
                if required:
                    raise Exception(error_msg)
                failed.append(filename)
        else:
            if required:
                raise FileNotFoundError(
                    f"Required artifact not found: {local_path}\n"
                    f"Make sure 'dbt run' completed successfully."
                )
            else:
                print(f"⊘ Skipping {filename} (not found - optional)")

    # Log summary
    print(f"\n{'='*50}")
    print(f"Upload Summary:")
    print(f"  Uploaded: {', '.join(uploaded) or 'None'}")
    print(f"  Skipped:  {', '.join(failed) or 'None'}")
    print(f"  Azure Location: {AZURE_STORAGE_ACCOUNT}/{AZURE_CONTAINER_NAME}/")
    print(f"{'='*50}")

    return {
        "uploaded": uploaded,
        "storage_account": AZURE_STORAGE_ACCOUNT,
        "container": AZURE_CONTAINER_NAME
    }


# =============================================================================
# DAG DEFINITION
# =============================================================================

with DAG(
    dag_id="dbt_with_azure",
    default_args=default_args,
    description="Run dbt models and sync metadata to Collate via Azure Blob",
    schedule_interval="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=["dbt", "collate", "azure", "data-pipeline"],
) as dag:

    # Task Group: dbt Execution
    with TaskGroup(group_id="dbt_execution") as dbt_tasks:

        dbt_run = BashOperator(
            task_id="dbt_run",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt run --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_test = BashOperator(
            task_id="dbt_test",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt test --profiles-dir {DBT_PROFILES_DIR}
            """,
            trigger_rule="all_done",
        )

        dbt_docs = BashOperator(
            task_id="dbt_docs_generate",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt docs generate --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_run >> dbt_test >> dbt_docs

    # Upload to Azure Blob
    upload_to_azure = PythonOperator(
        task_id="upload_artifacts_to_azure",
        python_callable=upload_artifacts_to_azure,
        provide_context=True,
    )

    # DAG Dependencies
    dbt_tasks >> upload_to_azure

2.3 Alternative: Azure CLI Upload

For simpler setups, use Azure CLI directly:
upload_with_az_cli = BashOperator(
    task_id="upload_to_azure",
    bash_command=f"""
        cd {DBT_PROJECT_DIR}/target && \
        az storage blob upload-batch \
            --account-name {AZURE_STORAGE_ACCOUNT} \
            --destination {AZURE_CONTAINER_NAME} \
            --source . \
            --pattern "*.json" \
            --overwrite || true
    """,
)

Step 3: Configure Collate

Configuration

  1. Go to Settings → Services → Database Services
  2. Click on your database service (e.g., “production-synapse”)
  3. Go to the Ingestion tab
  4. Click Add Ingestion
  5. Select dbt from the dropdown
Configure dbt Source (Azure):
FieldValueNotes
dbt Configuration SourceAzureSelect from dropdown
Azure Account Namedbtartifacts12345Your storage account name
Azure Container Namedbt-artifactsYour container name
Azure Blob PrefixLeave empty or specify folder
Azure Credentials (choose one): Option A: Using Account Key
FieldValue
Azure Account Keyabc123...Storage account key
Option B: Using Connection String
FieldValue
Azure Connection StringDefaultEndpointsProtocol=https;AccountName=...Full connection string
Configure dbt Options:
FieldRecommended Value
Update DescriptionsEnabled
Update OwnersEnabled
Include TagsEnabled
Classification NamedbtTags
Test & Deploy:
  1. Click Test Connection
  2. If successful, click Deploy
  3. Click Run to trigger immediately

Verification

After running the full pipeline, verify:
CheckHow to VerifyExpected Result
Azure blobs existaz storage blob list --container-name Xmanifest.json, catalog.json listed
Ingestion completedCollate UI → Service → Ingestion tabGreen status, no errors
Lineage appearsClick on a dbt model → Lineage tabUpstream/downstream connections
Descriptions syncedClick on a table → Schema tabColumn descriptions visible
Tags appearClick on a table → Tags sectiondbt tags shown

Troubleshooting

IssueSymptomCauseSolution
Access Denied”403 Forbidden” errorInsufficient permissionsVerify storage account key or SAS token is correct
Container Not Found”404 Not Found”Container name incorrectCheck container name matches actual container
Invalid Credentials”Authentication failed”Wrong credentialsVerify account key, connection string, or SAS token
No blobs foundArtifacts not appearingWrong upload path or failedCheck container and verify upload succeeded
Stale dataOld lineage/descriptionsOld artifacts in blobVerify dbt DAG uploads fresh artifacts

Next Steps

See other storage options: S3 | GCS | HTTP | Local | dbt Cloud