Troubleshooting

Use this section to diagnose the most common issues after deployment. For each symptom, work through the steps in order.

Could Not Get the Secret Value or Forbidden

ERROR (metadata.utils.kubernetes_secrets_manager:159) - Could not get the secret
value of <path>
Reason: Forbidden

Your ingestion pod’s service account doesn’t have permission to read the secret. Work through the causes below to find the root issue, then confirm which secrets manager your Runner is using by reviewing the pod logs:

kubectl logs -l app.kubernetes.io/name=hybrid-ingestion-runner,app.kubernetes.io/instance=collate-prod | grep secretsManager

Cause 1 — Missing IAM or Workload Identity

Your ingestion service account isn’t bound to the correct IAM (Identity and Access Management) role or Workload Identity. Do the following checks:

Verify the annotation on the ingestion service account.
Confirm the cloud IAM binding is in place for your provider.

Cause 2 — Secret Name Mismatch

The name you entered in the Collate UI doesn’t match the name under which the secret is stored in your secrets store. When you enter secret:my-db-password in the Collate UI, the runner strips the secret: prefix and looks up my-db-password directly in your secrets store. If the secret was stored under a different name — for example, with a path prefix like /collate/hybrid-ingestion-runner/my-db-password — the lookup fails because the runner is searching for my-db-password, not the full path. Do the following checks:

Open your secrets store (AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager) and confirm the exact name the secret is stored under.
In the Collate UI connection form, verify the masked field contains secret:<secret-name>, where <secret-name> matches the name in your secrets store character for character.
Check for typos, extra slashes, or path segments that aren’t part of the stored secret name.

Cause 3 — Missing `secretsManager` Helm Value

To use a cloud secrets manager, set config.secretsManager explicitly in your values.yaml. Without it, the Runner falls back to Kubernetes Secrets and can’t resolve cloud secrets manager paths. Do the following steps:

Open your values.yaml.
Confirm config.secretsManager is set to the correct value for your provider (managed-aws, gcp, or managed-azure-kv).
Run helm upgrade to apply the change.

Runner Shows as `ERROR` in the Collate UI

The runner can show as ERROR (shown as a red pill in the UI) when it cannot authenticate to Collate. A common cause is an incorrect IngestionBot JWT configured as config.authToken in your values.yaml. Check the runner logs for confirmation:

kubectl logs -n argo-workflows -l app.kubernetes.io/name=hybrid-ingestion-runner,app.kubernetes.io/instance=<release-name> | grep "Authentication rejected"

Fix: Update config.authToken with the correct IngestionBot JWT token and restart the runner pod.

Runner Shows as `Inactive` in the Collate UI

Check that the authToken in values.yaml is the correct JWT from the IngestionBot.
Verify outbound TLS (port 443) is allowed from your cluster to <your-instance>.getcollate.io.
Confirm the pod is running: kubectl get pods.

Check the Runner pod logs for connection or authentication errors:

kubectl logs -l app.kubernetes.io/name=hybrid-ingestion-runner,app.kubernetes.io/instance=collate-prod

`ImagePullBackOff` on the Runner Pod

The ECR credentials cron job may not have run yet. Trigger it manually:

kubectl create job --from=cronjob/ecr-registry-helper manual

Ingestion Pod Not Found — Diagnostics Unavailable

Issue: The ingestion job fails and the exit handler reports:

Could not retrieve pod diagnostics (pod may be deleted, missing RBAC
permissions, or other Kubernetes errors).

WARNING - No main pod found for workflow <workflow-id>
WARNING - Could not find main pod for workflow <workflow-id> - skipping diagnostics

Cause: The ingestion pod was removed before the exit handler could retrieve diagnostics. This is different from an application crash — a crashed or OOM-killed pod remains in Errored or OOMKilled state. An absent pod or ContainerStatusUnknown state indicates the pod was removed externally, typically by one of the following:

Cluster autoscaling scaled down the node running the ingestion pod.
A pod cleanup policy or TTL controller removed the pod.
The node was rotated or replaced during the ingestion run.

Resolution:

Check the pod state immediately after the next failure:

kubectl get pods -n argo-workflows
kubectl describe pod <ingestion-pod-name> -n argo-workflows

Review cluster events around the time of failure:

kubectl get events -n argo-workflows --sort-by='.lastTimestamp'

Once identified, work with your infrastructure team to address the cause — for example, configuring scale-down protection for ingestion workloads or excluding the argo-workflows namespace from pod cleanup policies.

If the pod is absent, it was removed by an external process before Argo’s configured TTL. Check your ARGO_SECONDS_AFTER_COMPLETION_TTL setting to confirm the expected retention window.

Deployment

Hybrid Runner Troubleshooting

Troubleshooting

Could Not Get the Secret Value or Forbidden

Cause 1 — Missing IAM or Workload Identity

Cause 2 — Secret Name Mismatch

Cause 3 — Missing `secretsManager` Helm Value

Runner Shows as `ERROR` in the Collate UI

Runner Shows as `Inactive` in the Collate UI

`ImagePullBackOff` on the Runner Pod

Ingestion Pod Not Found — Diagnostics Unavailable

​Troubleshooting

​Could Not Get the Secret Value or Forbidden

​Cause 1 — Missing IAM or Workload Identity

​Cause 2 — Secret Name Mismatch

​Cause 3 — Missing secretsManager Helm Value

​Runner Shows as ERROR in the Collate UI

​Runner Shows as Inactive in the Collate UI

​ImagePullBackOff on the Runner Pod

​Ingestion Pod Not Found — Diagnostics Unavailable

Troubleshooting

Could Not Get the Secret Value or Forbidden

Cause 1 — Missing IAM or Workload Identity

Cause 2 — Secret Name Mismatch

Cause 3 — Missing `secretsManager` Helm Value

Runner Shows as `ERROR` in the Collate UI

Runner Shows as `Inactive` in the Collate UI

`ImagePullBackOff` on the Runner Pod

Ingestion Pod Not Found — Diagnostics Unavailable