Skip to main content

Overview

Collate 1.12 introduces Git Sink, a workflow capability that allows metadata changes in Collate to be automatically synchronized to a Git repository. Many organizations are adopting a metadata as code approach. Just like application code, metadata benefits from version control, review workflows, and historical tracking. Git Sink captures metadata updates in Collate and commits them to GitHub. This allows teams to maintain version history, review governance changes, and integrate metadata management with existing engineering workflows. Collate remains the main interface for managing metadata while Git stores the version history.

Why Git Sink Matters

Metadata changes frequently as data platforms evolve. Tables are documented, tags are added, test cases are created, and governance classifications change. Without version control these updates are difficult to track. Git Sink helps organizations:
  • Track metadata history
  • Review governance updates through Git workflows
  • Integrate metadata with DevOps processes
  • Manage metadata using version control
This enables teams to treat metadata as code while continuing to manage it through the Collate interface.

How Git Sink Works

Git Sink is implemented through the Collate workflow engine. A workflow listens for metadata events and writes those changes to a Git repository. Examples of events include:
  • Creating a test case
  • Updating a table description
  • Adding tags or tiers
  • Updating glossary assignments
When these events occur, the workflow commits the updated metadata to Git.

Creating a Git Sink Workflow

Navigate to Governance → Workflows to create a workflow that captures metadata events and syncs them to Git.
Limitation: Currently, Git Sink workflows only support Start and End nodes. You cannot add additional intermediate nodes beyond the Git Sink node itself.

Steps

  1. Open Governance
  2. Select Workflows
Select Workflows in Governance
  1. Click Create Workflow
Create New Workflow
  1. Add a Start node by dragging and dropping the Node
Add Start Node to Workflow
  1. Select assets to monitor
  2. Choose Event based trigger or Periodic Batch. In Periodic batch one can schedule the running of the workflow.
Choose Event Based or Periodic Batch Trigger
  1. Add the Git Sink node
Add Git Sink Node
  1. Configure GitHub connection details
Configure GitHub Connection Details
  1. Add an End node
Add End Node to Workflow
  1. Save and activate the workflow
Save and activate the workflow Once activated, the workflow listens for metadata changes and pushes them to GitHub.

GitHub Configuration

Configure GitHub Connection Details The Git Sink node requires the following details: Repository URL Provide the GitHub repository where metadata files will be stored. Access Token Create a GitHub personal access token from the developer settings and use it for authentication. Conflict Resolution Choose how conflicts are handled:
  • Overwrite external changes
  • Preserve existing changes
  • Fail on conflict
Most organizations treat Collate as the primary metadata source.

Repository Structure

Metadata synced to GitHub is stored as YAML files. The structure reflects the hierarchy of the data platform. Example
service
  database
    schema
      tables
        customer.yaml
Each file includes metadata such as:
  • Fully qualified name
  • Columns
  • Tags and classifications
  • Descriptions
  • Timestamps and user information
This makes it easy to track metadata changes and maintain a clear history.

Example Workflow

  1. A Git Sink workflow is active.
  2. A user updates a table in Collate.
  3. The update triggers a metadata event.
  4. The workflow commits the change to GitHub.
  5. The repository stores the updated metadata in YAML format.
Future updates create additional commits, allowing teams to track metadata changes over time.