Skip to main content

dbt Artifact Storage: AWS S3 Configuration

This guide walks you through configuring AWS S3 as the artifact storage layer for dbt Core + Collate integration. After completing this guide, your dbt artifacts will automatically sync to Collate for metadata extraction and lineage tracking.

Prerequisites Checklist

RequirementDetailsHow to Verify
AWS AccountWith permissions to create S3 buckets and IAM policiesaws sts get-caller-identity
AWS CLIInstalled and configuredaws --version
dbt ProjectExisting dbt projectdbt debug
OrchestrationAirflow or similar schedulerAccess to DAG configuration
Database ServiceData warehouse already ingestedCheck Settings → Services

Step 1: AWS S3 Setup

1.1 Create S3 Bucket

# Set your variables
export AWS_REGION="us-east-1"
export BUCKET_NAME="your-company-dbt-artifacts"

# Create the bucket
aws s3 mb s3://${BUCKET_NAME} --region ${AWS_REGION}

# Verify bucket creation
aws s3 ls | grep ${BUCKET_NAME}
Expected output:
2026-02-10 10:30:00 your-company-dbt-artifacts

1.2 Create IAM Policy for dbt (Write Access)

Your Airflow/dbt environment needs permission to write to S3. Save this as dbt-s3-write-policy.json:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDBTArtifactUpload",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::your-company-dbt-artifacts/dbt-artifacts/*"
        },
        {
            "Sid": "AllowBucketListing",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::your-company-dbt-artifacts"
        }
    ]
}
Create and attach the policy:
# Create the IAM policy
aws iam create-policy \
    --policy-name dbt-s3-write-policy \
    --policy-document file://dbt-s3-write-policy.json

# Attach to your Airflow/ECS role
export AIRFLOW_ROLE_NAME="your-airflow-task-role"

aws iam attach-role-policy \
    --role-name ${AIRFLOW_ROLE_NAME} \
    --policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/dbt-s3-write-policy

1.3 Create IAM Policy for Collate (Read Access)

Collate needs permission to read from S3. Save this as collate-s3-read-policy.json:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowCollateRead",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::your-company-dbt-artifacts",
                "arn:aws:s3:::your-company-dbt-artifacts/dbt-artifacts/*"
            ]
        }
    ]
}
Create the policy:
# Create the policy
aws iam create-policy \
    --policy-name collate-s3-read-policy \
    --policy-document file://collate-s3-read-policy.json

# Attach to Collate's role or create access keys for Collate user

1.4 Verify S3 Access

# Create test file
echo "test" > /tmp/test.txt

# Upload it
aws s3 cp /tmp/test.txt s3://${BUCKET_NAME}/dbt-artifacts/test.txt

# Verify it exists
aws s3 ls s3://${BUCKET_NAME}/dbt-artifacts/

# Clean up
aws s3 rm s3://${BUCKET_NAME}/dbt-artifacts/test.txt
rm /tmp/test.txt

Step 2: Upload Artifacts from dbt

2.1 Understanding dbt Artifacts

Collate requires these dbt-generated files:
FileGenerated ByRequired?What It Contains
manifest.jsondbt run, dbt compile, dbt buildYESModels, sources, lineage, descriptions, tests
catalog.jsondbt docs generateRecommendedColumn names, types, descriptions
run_results.jsondbt run, dbt test, dbt buildOptionalTest pass/fail results, timing
Generate all artifacts:
dbt run          # Generates manifest.json
dbt test         # Updates run_results.json
dbt docs generate # Generates catalog.json

2.2 Complete Airflow DAG Example

This is a complete, working DAG for uploading dbt artifacts to S3. Save as dbt_with_collate.py in your Airflow DAGs folder:
"""
dbt + Collate Integration DAG (S3 Method)

This DAG:
1. Runs dbt models
2. Runs dbt tests
3. Generates dbt documentation (catalog.json)
4. Uploads all artifacts to S3

No Collate packages are installed in this Airflow environment.
Collate pulls the artifacts from S3 independently.
"""

import os
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.python import PythonOperator
from airflow.utils.task_group import TaskGroup


# =============================================================================
# CONFIGURATION
# =============================================================================

# dbt Configuration
DBT_PROJECT_DIR = os.getenv("DBT_PROJECT_DIR", "/opt/airflow/dbt/my_project")
DBT_PROFILES_DIR = os.getenv("DBT_PROFILES_DIR", "/opt/airflow/dbt")

# S3 Configuration
S3_BUCKET = os.getenv("S3_BUCKET", "your-company-dbt-artifacts")
S3_PREFIX = os.getenv("S3_PREFIX", "dbt-artifacts")
AWS_REGION = os.getenv("AWS_DEFAULT_REGION", "us-east-1")

# =============================================================================
# DAG DEFAULT ARGUMENTS
# =============================================================================

default_args = {
    "owner": "data-engineering",
    "depends_on_past": False,
    "email": ["data-team@yourcompany.com"],
    "email_on_failure": True,
    "email_on_retry": False,
    "retries": 2,
    "retry_delay": timedelta(minutes=5),
    "execution_timeout": timedelta(hours=2),
}

# =============================================================================
# PYTHON FUNCTIONS
# =============================================================================

def upload_artifacts_to_s3(**context):
    """
    Upload dbt artifacts to S3.

    Uses boto3 (AWS SDK) which is typically available in Airflow.
    If not: pip install boto3
    """
    import boto3
    from botocore.exceptions import ClientError

    s3_client = boto3.client("s3", region_name=AWS_REGION)
    target_dir = os.path.join(DBT_PROJECT_DIR, "target")

    # Files to upload
    artifacts = [
        ("manifest.json", True),      # Required
        ("catalog.json", False),      # Optional but recommended
        ("run_results.json", False),  # Optional
        ("sources.json", False),      # Optional
    ]

    uploaded = []
    failed = []

    for filename, required in artifacts:
        local_path = os.path.join(target_dir, filename)
        s3_key = f"{S3_PREFIX}/{filename}"

        if os.path.exists(local_path):
            try:
                s3_client.upload_file(local_path, S3_BUCKET, s3_key)
                uploaded.append(filename)
                print(f"✓ Uploaded {filename} to s3://{S3_BUCKET}/{s3_key}")
            except ClientError as e:
                error_msg = f"✗ Failed to upload {filename}: {e}"
                print(error_msg)
                if required:
                    raise Exception(error_msg)
                failed.append(filename)
        else:
            if required:
                raise FileNotFoundError(
                    f"Required artifact not found: {local_path}\n"
                    f"Make sure 'dbt run' completed successfully."
                )
            else:
                print(f"⊘ Skipping {filename} (not found - optional)")

    # Log summary
    print(f"\n{'='*50}")
    print(f"Upload Summary:")
    print(f"  Uploaded: {', '.join(uploaded) or 'None'}")
    print(f"  Skipped:  {', '.join(failed) or 'None'}")
    print(f"  S3 Location: s3://{S3_BUCKET}/{S3_PREFIX}/")
    print(f"{'='*50}")

    return {"uploaded": uploaded, "bucket": S3_BUCKET, "prefix": S3_PREFIX}


# =============================================================================
# DAG DEFINITION
# =============================================================================

with DAG(
    dag_id="dbt_with_collate",
    default_args=default_args,
    description="Run dbt models and sync metadata to Collate via S3",
    schedule_interval="0 6 * * *",  # Daily at 6 AM UTC
    start_date=datetime(2024, 1, 1),
    catchup=False,
    max_active_runs=1,
    tags=["dbt", "collate", "data-pipeline"],
) as dag:

    # Task Group: dbt Execution
    with TaskGroup(group_id="dbt_execution") as dbt_tasks:

        dbt_run = BashOperator(
            task_id="dbt_run",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt run --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_test = BashOperator(
            task_id="dbt_test",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt test --profiles-dir {DBT_PROFILES_DIR}
            """,
            trigger_rule="all_done",  # Run even if dbt_run fails
        )

        dbt_docs = BashOperator(
            task_id="dbt_docs_generate",
            bash_command=f"""
                cd {DBT_PROJECT_DIR} && \
                dbt docs generate --profiles-dir {DBT_PROFILES_DIR}
            """,
        )

        dbt_run >> dbt_test >> dbt_docs

    # Upload to S3
    upload_to_s3 = PythonOperator(
        task_id="upload_artifacts_to_s3",
        python_callable=upload_artifacts_to_s3,
        provide_context=True,
    )

    # DAG Dependencies
    dbt_tasks >> upload_to_s3

2.3 Verify DAG Deployment

# Check DAG is visible in Airflow
airflow dags list | grep dbt

# Trigger manual run
airflow dags trigger dbt_with_collate

# Check S3 after DAG completes
aws s3 ls s3://your-company-dbt-artifacts/dbt-artifacts/
Expected S3 output:
2026-02-10 10:30:00   5242880 manifest.json
2026-02-10 10:30:01   1048576 catalog.json
2026-02-10 10:30:01    102400 run_results.json

Step 3: Configure Collate

Configuration

  1. Go to Settings → Services → Database Services
  2. Click on your database service (e.g., “production-snowflake”)
  3. Go to the Ingestion tab
  4. Click Add Ingestion
  5. Select dbt from the dropdown
Configure dbt Source (S3):
FieldValueNotes
dbt Configuration SourceS3Select from dropdown
S3 Bucket Nameyour-company-dbt-artifactsYour bucket name
S3 Object Prefixdbt-artifactsFolder path (no leading /)
AWS Regionus-east-1Your region
AWS Credentials (choose one): Option A: Using Access Keys
FieldValue
AWS Access Key IDAKIA...
AWS Secret Access KeywJalrXUtn...
Option B: Using IAM Role (if Collate runs on AWS)
FieldValue
AWS Access Key IDLeave empty
AWS Secret Access KeyLeave empty
Configure dbt Options:
FieldRecommended Value
Update DescriptionsEnabled
Update OwnersEnabled
Include TagsEnabled
Classification NamedbtTags
Test & Deploy:
  1. Click Test Connection
  2. If successful, click Deploy
  3. Click Run to trigger immediately

Verification

After running the full pipeline, verify:
CheckHow to VerifyExpected Result
S3 artifacts existaws s3 ls s3://bucket/dbt-artifacts/manifest.json, catalog.json listed
Ingestion completedCollate UI → Service → Ingestion tabGreen status, no errors
Lineage appearsClick on a dbt model → Lineage tabUpstream/downstream connections
Descriptions syncedClick on a table → Schema tabColumn descriptions visible
Tags appearClick on a table → Tags sectiondbt tags shown

Troubleshooting

IssueSymptomCauseSolution
Access Denied”403 Forbidden” errorIAM permissions insufficientVerify IAM policy has s3:GetObject and s3:ListBucket
Manifest not found”dbtManifestFilePath not found”S3 path incorrectCheck dbtObjectPrefix matches your S3 structure
No lineageTables exist but no lineageDatabase metadata not ingested firstRun database metadata ingestion before dbt ingestion
Stale dataOld lineage/descriptionsOld artifacts in S3Verify dbt DAG uploads fresh artifacts
Missing columnsNo column descriptionsMissing catalog.jsonEnsure dbt docs generate runs and uploads

Next Steps

See other storage options: GCS | Azure | HTTP | Local | dbt Cloud