Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getcollate.io/llms.txt

Use this file to discover all available pages before exploring further.

Local Ingestion Agent

The Local Ingestion Agent lets you securely push metadata from your own infrastructure into Collate — without exposing your internal systems or relying on additional server infrastructure. It’s the right choice when you need to run connectors on-premises and want to be up and running from your local system. Unlike the Hybrid Ingestion Runner, the Local Ingestion Agent doesn’t require a Kubernetes cluster or any cloud infrastructure — just Docker. With the Local Ingestion Agent, you can:
  • Set up ingestion workflows from a UI.
  • Run end-to-end metadata extraction workflows: metadata, profiling, lineage, usage, dbt, auto-classification, and data quality.
Before you start: Contact Collate support to get your Docker credentials before setting up the Local Ingestion Agent.

Setting Up the Collate Ingestion Agent

Follow these steps to install, connect, and configure the Local Ingestion Agent on your machine.

Step 1: Install and Start the Agent

You need Docker installed on your laptop. Run the following commands:
  1. Authenticate with the Docker registry using the credentials provided by Collate support. For example, the following command shows the registry URL specific to AWS provided by Collate:
    docker login --username AWS -p eyJwY... 118146679784.dkr.ecr.eu-west-1.amazonaws.com
    
  2. Run the Docker image to start the Local Agent:
    docker run -it --rm -p 8001:8001 -e CL_BASE_DIR='/collate/collate-local-webserver/' -v ./.collate:/collate/collate-local-webserver/.collate 118146679784.dkr.ecr.eu-west-1.amazonaws.com/collate-customers-local-ingestion:<collate-version>
    
    Replace <collate-version> with your Collate instance version. For example, 1.11.1.

Step 2: Connect the Agent to Collate

Generate a JWT token from your Collate instance and use it to authenticate the agent.
  1. In your web browser, navigate to http://localhost:8001 to open the local agent UI.
  2. Enter your Collate platform URL (for example, https://<your-company>.getcollate.io/api).
  3. In your Collate instance, navigate to Settings > Bots and click Add Bot.
  4. Enter Email, Display Name, Token Expiration, and Description, then click Create.
  5. Assign the appropriate roles – typically DefaultBotPolicy and Ingestion Bot Policy.
  6. Copy and save the JWT token for later use. JWT Token

Step 3: Add a New Service

Add a data service in the Local Collate UI and verify the connection before running ingestion.
  1. Navigate to Settings > Services and select your service type (for example, Databases).
  2. Click + Add New Service and choose your connector type, then click Next.
  3. Enter the unique Name and Description, then click Next.
  4. Fill the connection details.
  5. Click Test Connection to verify the connection is successful.

Step 4: Verify Ingested Data

Confirm that metadata from your data source is now visible in the Collate platform.
  1. Return to the Collate platform at https://<your-company>.getcollate.io and refresh the database services page. Your new service will appear there now.
  2. Verify that the ingested metadata, including schemas, tables, and column details, is available.

Step 5: Configure Metadata Ingestion

After your agent is connected and your service is set up, proceed to Step 2 - Configure Metadata Ingestion to schedule your first ingestion pipeline.

Additional Features

The Collate Ingestion Agent supports various ingestion workflows, allowing you to:
  • Generate YAML Configurations: Download YAML files for external scheduling.
  • Manage Ingestion Types: Run metadata, profiling, lineage, usage, and other workflows as needed.
  • Monitor Progress: View logs and monitor real-time ingestion activity.