Hybrid Runner Configuration References

Environment Variables Reference

The Hybrid Runner supports two ways to pass environment variables to ingestion pods.

Setting Environment Variables

Via config.ingestionPods.customConfig.containerParams.env: Use containerParams to set environment variables on ingestion pods.

config:
  ingestionPods:
    customConfig:
      enabled: true
      containerParams:
        env:
          - name: HTTP_PROXY
            value: http://corp-proxy.svc:8080
          - name: HTTPS_PROXY
            value: http://corp-proxy.svc:8080
          - name: NO_PROXY
            value: ".svc, .mycorp.internal"

Runner Environment Variables

DYNAMIC_INGESTION_VERSION_ENABLED: When enabled, the Runner automatically resolves the ingestion pod image tag to match your Collate server version. Only disable this if you mirror Collate images to your own private registry and manage versioning manually:
extraEnvs: - name: DYNAMIC_INGESTION_VERSION_ENABLED value: 'false'

Note: It is recommended to keep DYNAMIC_INGESTION_VERSION_ENABLED value to true. This ensures ingestion fixes and updates are applied automatically without having to manage image tags yourself.

Advanced Configuration

Use the options below to customise workflow behaviour, configure container settings, and run multiple Hybrid Runner instances in the same cluster.

Defining Workflow Fields

Override workflow-level and container-level fields using custom configuration.

Workflow Parameters (workflowParams) Use workflowParams to override workflow-level fields. A typical use case is defining tolerations or pod affinity/anti-affinity:

  config:
    ingestionPods:
      customConfig:
        enabled: true
        workflowParams:
          tolerations:
            - key: team
              effect: NoSchedule
              operator: Equal
              value: data-science

Container Parameters (containerParams) Use containerParams to override container-level fields such as environment variables:

  config:
    ingestionPods:
      customConfig:
        enabled: true
        containerParams:
          env:
            - name: HTTP_PROXY
              value: http://corp-proxy.svc:8080
            - name: HTTPS_PROXY
              value: http://corp-proxy.svc:8080
            - name: NO_PROXY
              value: ".svc, .mycorp.internal"

Note: Custom workflow fields require Hybrid Runner Helm chart version 1.12.5 or later.

Adding Pod Labels

Set pod labels on ingestion pods for pod security policies, cost attribution, or workload identification.

Note: Pod labeling requires Hybrid Runner Helm chart version 1.12.9 or later.

Argo Workflows Executor

Set pod labels via config.ingestionPods.customConfig.workflowParams:

config:
  ingestionPods:
    customConfig:
      enabled: true
      workflowParams:
        podMetadata:
          labels:
            app.kubernetes.io/name: collate-hybrid-ingestion-runner-ingestion
            app.kubernetes.io/part-of: collate-hybrid-ingestion-runner

Simple Kubernetes Executor

Use a podSpecFilePath override file to set labels and annotations on ingestion pod templates. Create a YAML file with a metadata block alongside any other pod spec overrides:

metadata:
  labels:
    cost-allocation/team: data-platform
    my.test/label: verified
  annotations:
    my.test/annotation: "true"
tolerations:
  - key: dedicated
    operator: Equal
    value: ingestion
    effect: NoSchedule

Then reference the file in your Helm values:

config:
  ingestionPods:
    podSpecFilePath: /path/to/pod-spec-overrides.yaml

After you upgrade the runner to 1.12.9, your pod labels will appear on all ingestion pods automatically.

Running Multiple Instances in the Same Cluster

Argo Workflows is a cluster-wide application. When deploying multiple Hybrid Runner instances in a single cluster, only one instance should install Argo Workflows. For each additional instance, set:

installArgoWorkflows: false
config:
  argoWorkflows:
    endpoint: http://argo-workflows-server.argo-workflows.svc:2746

Configuring Node Scheduling for Ingestion Pods

By default, ingestion pods are scheduled on any available node in your cluster. If your cluster uses node taints to isolate workloads, you must configure tolerations and node affinity so ingestion pods can be scheduled on the correct nodes.

If every node in your cluster has a NoSchedule taint and no tolerations are configured, ingestion pods will fail to schedule and remain stuck in Pending state. This is the most common cause of scheduling failures after deploying the Hybrid Runner.

To check the taints on your nodes:

kubectl get nodes -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Choosing the Right Configuration Key

The configuration key depends on which executor your Hybrid Runner uses:

Executor	Environment variable	When to use
Argo Workflows (default)	`ARGO_PIPELINE_TYPE_CONFIGS`	Hybrid Runner deployed with `installArgoWorkflows: true`
Simple Kubernetes	`SIMPLEK8S_PIPELINE_TYPE_CONFIGS`	Hybrid Runner deployed without Argo Workflows

Do not set both environment variables. Each executor reads from its own configuration key. Setting SIMPLEK8S_PIPELINE_TYPE_CONFIGS when using Argo has no effect, and vice versa.

Argo Workflows Executor

Add the following to your values.yaml. Replace openmetadata-hybrid-runner with the taint value used in your cluster:

extraEnvs:
  - name: ARGO_PIPELINE_TYPE_CONFIGS
    value: >-
      {
        "automation": {
          "toleration": "openmetadata-hybrid-runner",
          "affinity": {
            "nodeAffinity": {
              "requiredDuringSchedulingIgnoredDuringExecution": {
                "nodeSelectorTerms": [
                  {
                    "matchExpressions": [
                      {
                        "key": "nodetype",
                        "operator": "In",
                        "values": ["openmetadata-hybrid-runner"]
                      }
                    ]
                  }
                ]
              }
            }
          }
        },
        "metadata": {
          "toleration": "openmetadata-hybrid-runner"
        },
        "profiler": {
          "toleration": "openmetadata-hybrid-runner"
        },
        "lineage": {
          "toleration": "openmetadata-hybrid-runner"
        }
      }

The toleration field accepts a single string value. The runner creates a fixed dedicated=<value> Kubernetes toleration for all ingestion pods of that type. For full Kubernetes toleration objects (multiple taints, custom operators or effects), use config.ingestionPods.customConfig.workflowParams instead.

Simple Kubernetes Executor

If you’re using the Simple Kubernetes executor without Argo Workflows, use SIMPLEK8S_PIPELINE_TYPE_CONFIGS instead:

extraEnvs:
  - name: SIMPLEK8S_PIPELINE_TYPE_CONFIGS
    value: >-
      {
        "automation": {
          "tolerations": [
            {
              "key": "nodetype",
              "operator": "Equal",
              "value": "openmetadata-hybrid-runner",
              "effect": "NoSchedule"
            }
          ],
          "nodeSelector": {
            "nodetype": "openmetadata-hybrid-runner"
          }
        }
      }

Supported Configuration Fields

Each pipeline type (automation, metadata, profiler, lineage) supports the following fields:

Field	Description
`toleration`	A single taint value string. The runner creates a `dedicated=<value>` Kubernetes toleration for ingestion pods of that type. For full toleration control, use `config.ingestionPods.customConfig.workflowParams`.
`affinity`	Node and pod affinity rules — supports `nodeAffinity`, `podAffinity`, and `podAntiAffinity`
`nodeSelector`	Key-value labels to target specific nodes
`priorityClass`	Kubernetes priority class name for the pod
`resources`	CPU and memory `requests` and `limits`

Prometheus Metrics

The Hybrid Runner exposes operational metrics in a Prometheus-compatible format via an HTTP endpoint. These metrics provide insight into agent state, activity, and performance.

The available metrics may evolve over time. Inspect the /metrics endpoint directly for the latest set of available metrics.

Configuration

Configure the metrics endpoint in your Helm values:

metricsServerConfiguration:
  port: ${METRICS_SERVER_PORT:-8989}
  path: ${METRICS_SERVER_PATH:-/metrics}

port: Port on which the metrics endpoint is served (default: 8989).
path: HTTP path for accessing metrics (default: /metrics).

Both parameters support environment variable overrides.

Accessing Metrics

Once configured, access metrics at:

http://<agent-host>:<port>/<path>

With default settings:

http://localhost:8989/metrics

Example Metric

# HELP collate_hybrid_agent_connected Is the agent connected to the server?
# (0 = No, 1 = Yes)
# TYPE collate_hybrid_agent_connected gauge
collate_hybrid_agent_connected 1.0

This metric indicates whether the agent is currently connected to the Collate server — 1.0 means connected, 0 means disconnected.

Hybrid Runner Images

This section covers how to host Collate images in your own container registry and how to manage image tags when you prefer to control versioning yourself.

Hosting Your Own Docker Images

To mirror Collate images to your own container registry (for example, Google Artifact Registry), set the following values. Mirror both the Hybrid Runner image and the Ingestion pod image — both are required for the Runner to function correctly.

Hybrid Runner image: Sets the repository, tag, and pull credentials for the main Runner pod:

image:
  repository: my-repo.com/my-image
  tag: my-tag
imagePullSecrets: my-credentials

Ingestion pod image: Sets the repository, tag, and pull credentials for the pods that execute ingestion jobs:

config:
  ingestionPods:
    repository: my-repo.com/my-image
    tag: my-tag
    imagePullSecrets: my-credentials

Managing Your Own Image Tags

By default, the Hybrid Runner automatically resolves image tags to match your Collate server version (for example, om-1.11.1-cl-1.11.1). To manage your own tags, disable automatic resolution using the DYNAMIC_INGESTION_VERSION_ENABLED environment variable. For configuration details, see Runner Environment Variables.

​Environment Variables Reference

​Setting Environment Variables

​Runner Environment Variables

​Advanced Configuration

​Defining Workflow Fields

​Adding Pod Labels

​Argo Workflows Executor

​Simple Kubernetes Executor

​Running Multiple Instances in the Same Cluster

​Configuring Node Scheduling for Ingestion Pods

​Choosing the Right Configuration Key

​Argo Workflows Executor

​Simple Kubernetes Executor

​Supported Configuration Fields

​Prometheus Metrics

​Configuration

​Accessing Metrics

​Example Metric

​Hybrid Runner Images

​Hosting Your Own Docker Images

​Managing Your Own Image Tags

Environment Variables Reference

Setting Environment Variables

Runner Environment Variables

Advanced Configuration

Defining Workflow Fields

Adding Pod Labels

Argo Workflows Executor

Simple Kubernetes Executor

Running Multiple Instances in the Same Cluster

Configuring Node Scheduling for Ingestion Pods

Choosing the Right Configuration Key

Argo Workflows Executor

Simple Kubernetes Executor

Supported Configuration Fields

Prometheus Metrics

Configuration

Accessing Metrics

Example Metric

Hybrid Runner Images

Hosting Your Own Docker Images

Managing Your Own Image Tags