Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getcollate.io/llms.txt

Use this file to discover all available pages before exploring further.

Environment Variables Reference

The Hybrid Runner supports two ways to pass environment variables to ingestion pods.

Setting Environment Variables

  • Via config.ingestionPods.customConfig.containerParams.env: Use containerParams to set environment variables on ingestion pods.
    config:
      ingestionPods:
        customConfig:
          enabled: true
          containerParams:
            env:
              - name: HTTP_PROXY
                value: http://corp-proxy.svc:8080
              - name: HTTPS_PROXY
                value: http://corp-proxy.svc:8080
              - name: NO_PROXY
                value: ".svc, .mycorp.internal"
    

Runner Environment Variables

  • DYNAMIC_INGESTION_VERSION_ENABLED: When enabled, the Runner automatically resolves the ingestion pod image tag to match your Collate server version. Only disable this if you mirror Collate images to your own private registry and manage versioning manually:
      extraEnvs:
        - name: DYNAMIC_INGESTION_VERSION_ENABLED
          value: 'false'
    
Note: It is recommended to keep DYNAMIC_INGESTION_VERSION_ENABLED value to true. This ensures ingestion fixes and updates are applied automatically without having to manage image tags yourself.

Advanced Configuration

Use the options below to customise workflow behaviour, configure container settings, and run multiple Hybrid Runner instances in the same cluster.

Defining Workflow Fields

Override workflow-level and container-level fields using custom configuration.
  • Workflow Parameters (workflowParams) Use workflowParams to override workflow-level fields. A typical use case is defining tolerations or pod affinity/anti-affinity:
  config:
    ingestionPods:
      customConfig:
        enabled: true
        workflowParams:
          tolerations:
            - key: team
              effect: NoSchedule
              operator: Equal
              value: data-science
  • Container Parameters (containerParams) Use containerParams to override container-level fields such as environment variables:
  config:
    ingestionPods:
      customConfig:
        enabled: true
        containerParams:
          env:
            - name: HTTP_PROXY
              value: http://corp-proxy.svc:8080
            - name: HTTPS_PROXY
              value: http://corp-proxy.svc:8080
            - name: NO_PROXY
              value: ".svc, .mycorp.internal"
Note: Custom workflow fields require Hybrid Runner Helm chart version 1.12.5 or later.

Adding pod labels

Set pod labels on ingestion pods for pod security policies, cost attribution, or workload identification.
Note: Pod labeling requires Hybrid Runner Helm chart version 1.12.9 or later.

Argo Workflows executor

Set pod labels via config.ingestionPods.customConfig.workflowParams:
config:
  ingestionPods:
    customConfig:
      enabled: true
      workflowParams:
        podMetadata:
          labels:
            app.kubernetes.io/name: collate-hybrid-ingestion-runner-ingestion
            app.kubernetes.io/part-of: collate-hybrid-ingestion-runner

Simple Kubernetes executor

Use a podSpecFilePath override file to set labels and annotations on ingestion pod templates. Create a YAML file with a metadata block alongside any other pod spec overrides:
metadata:
  labels:
    cost-allocation/team: data-platform
    my.test/label: verified
  annotations:
    my.test/annotation: "true"
tolerations:
  - key: dedicated
    operator: Equal
    value: ingestion
    effect: NoSchedule
Then reference the file in your Helm values:
config:
  ingestionPods:
    podSpecFilePath: /path/to/pod-spec-overrides.yaml
After you upgrade the runner to 1.12.9, your pod labels will appear on all ingestion pods automatically.

Running Multiple Instances in the Same Cluster

Argo Workflows is a cluster-wide application. When deploying multiple Hybrid Runner instances in a single cluster, only one instance should install Argo Workflows. For each additional instance, set:
installArgoWorkflows: false
config:
  argoWorkflows:
    endpoint: http://argo-workflows-server.argo-workflows.svc:2746

Configuring Node Scheduling for Ingestion Pods

By default, ingestion pods are scheduled on any available node in your cluster. If your cluster uses node taints to isolate workloads, you must configure tolerations and node affinity so ingestion pods can be scheduled on the correct nodes.
If every node in your cluster has a NoSchedule taint and no tolerations are configured, ingestion pods will fail to schedule and remain stuck in Pending state. This is the most common cause of scheduling failures after deploying the Hybrid Runner.
To check the taints on your nodes:
kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Choosing the Right Configuration Key

The configuration key depends on which executor your Hybrid Runner uses:
ExecutorEnvironment variableWhen to use
Argo Workflows (default)ARGO_PIPELINE_TYPE_CONFIGSHybrid Runner deployed with installArgoWorkflows: true
Simple KubernetesSIMPLEK8S_PIPELINE_TYPE_CONFIGSHybrid Runner deployed without Argo Workflows
Do not set both environment variables. Each executor reads from its own configuration key. Setting SIMPLEK8S_PIPELINE_TYPE_CONFIGS when using Argo has no effect, and vice versa.

Argo Workflows Executor

Add the following to your values.yaml. Replace openmetadata-hybrid-runner with the taint value used in your cluster:
extraEnvs:
  - name: ARGO_PIPELINE_TYPE_CONFIGS
    value: >-
      {
        "automation": {
          "toleration": "openmetadata-hybrid-runner",
          "affinity": {
            "nodeAffinity": {
              "requiredDuringSchedulingIgnoredDuringExecution": {
                "nodeSelectorTerms": [
                  {
                    "matchExpressions": [
                      {
                        "key": "nodetype",
                        "operator": "In",
                        "values": ["openmetadata-hybrid-runner"]
                      }
                    ]
                  }
                ]
              }
            }
          }
        },
        "metadata": {
          "toleration": "openmetadata-hybrid-runner"
        },
        "profiler": {
          "toleration": "openmetadata-hybrid-runner"
        },
        "lineage": {
          "toleration": "openmetadata-hybrid-runner"
        }
      }
The toleration field accepts a single string value. The runner creates a fixed dedicated=<value> Kubernetes toleration for all ingestion pods of that type. For full Kubernetes toleration objects (multiple taints, custom operators or effects), use config.ingestionPods.customConfig.workflowParams instead.

Simple Kubernetes Executor

If you’re using the Simple Kubernetes executor without Argo Workflows, use SIMPLEK8S_PIPELINE_TYPE_CONFIGS instead:
extraEnvs:
  - name: SIMPLEK8S_PIPELINE_TYPE_CONFIGS
    value: >-
      {
        "automation": {
          "tolerations": [
            {
              "key": "nodetype",
              "operator": "Equal",
              "value": "openmetadata-hybrid-runner",
              "effect": "NoSchedule"
            }
          ],
          "nodeSelector": {
            "nodetype": "openmetadata-hybrid-runner"
          }
        }
      }

Supported Configuration Fields

Each pipeline type (automation, metadata, profiler, lineage) supports the following fields:
FieldDescription
tolerationA single taint value string. The runner creates a dedicated=<value> Kubernetes toleration for ingestion pods of that type. For full toleration control, use config.ingestionPods.customConfig.workflowParams.
affinityNode and pod affinity rules — supports nodeAffinity, podAffinity, and podAntiAffinity
nodeSelectorKey-value labels to target specific nodes
priorityClassKubernetes priority class name for the pod
resourcesCPU and memory requests and limits

Prometheus Metrics

The Hybrid Runner exposes operational metrics in a Prometheus-compatible format via an HTTP endpoint. These metrics provide insight into agent state, activity, and performance.
The available metrics may evolve over time. Inspect the /metrics endpoint directly for the latest set of available metrics.

Configuration

Configure the metrics endpoint in your Helm values:
metricsServerConfiguration:
  port: ${METRICS_SERVER_PORT:-8989}
  path: ${METRICS_SERVER_PATH:-/metrics}
  • port: Port on which the metrics endpoint is served (default: 8989).
  • path: HTTP path for accessing metrics (default: /metrics).
Both parameters support environment variable overrides.

Accessing Metrics

Once configured, access metrics at:
http://<agent-host>:<port>/<path>
With default settings:
http://localhost:8989/metrics

Example Metric

# HELP collate_hybrid_agent_connected Is the agent connected to the server?
# (0 = No, 1 = Yes)
# TYPE collate_hybrid_agent_connected gauge
collate_hybrid_agent_connected 1.0
This metric indicates whether the agent is currently connected to the Collate server — 1.0 means connected, 0 means disconnected.

Hybrid Runner Images

This section covers how to host Collate images in your own container registry and how to manage image tags when you prefer to control versioning yourself.

Hosting Your Own Docker Images

To mirror Collate images to your own container registry (for example, Google Artifact Registry), set the following values. Mirror both the Hybrid Runner image and the Ingestion pod image — both are required for the Runner to function correctly.
  • Hybrid Runner image: Sets the repository, tag, and pull credentials for the main Runner pod:
    image:
      repository: my-repo.com/my-image
      tag: my-tag
    imagePullSecrets: my-credentials
    
  • Ingestion pod image: Sets the repository, tag, and pull credentials for the pods that execute ingestion jobs:
    config:
      ingestionPods:
        repository: my-repo.com/my-image
        tag: my-tag
        imagePullSecrets: my-credentials
    

Managing Your Own Image Tags

By default, the Hybrid Runner automatically resolves image tags to match your Collate server version (for example, om-1.11.1-cl-1.11.1). To manage your own tags, disable automatic resolution using the DYNAMIC_INGESTION_VERSION_ENABLED environment variable. For configuration details, see Runner Environment Variables.