Troubleshooting
Use this section to diagnose the most common issues after deployment. For each symptom, work through the steps in order.Could Not Get the Secret Value or Forbidden
Cause 1 — Missing IAM or Workload Identity
Your ingestion service account isn’t bound to the correct IAM (Identity and Access Management) role or Workload Identity. Do the following checks:- Verify the annotation on the
ingestionservice account. - Confirm the cloud IAM binding is in place for your provider.
Cause 2 — Secret Name Mismatch
The name you entered in the Collate UI doesn’t match the name under which the secret is stored in your secrets store. When you entersecret:my-db-password in the Collate UI, the runner strips the secret: prefix and looks up my-db-password directly in your secrets store. If the secret was stored under a different name — for example, with a path prefix like /collate/hybrid-ingestion-runner/my-db-password — the lookup fails because the runner is searching for my-db-password, not the full path.
Do the following checks:
- Open your secrets store (AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager) and confirm the exact name the secret is stored under.
- In the Collate UI connection form, verify the masked field contains
secret:<secret-name>, where<secret-name>matches the name in your secrets store character for character. - Check for typos, extra slashes, or path segments that aren’t part of the stored secret name.
Cause 3 — Missing secretsManager Helm Value
To use a cloud secrets manager, set config.secretsManager explicitly in your
values.yaml. Without it, the Runner falls back to Kubernetes Secrets and can’t resolve cloud secrets manager paths.
Do the following steps:
- Open your
values.yaml. - Confirm
config.secretsManageris set to the correct value for your provider (managed-aws,gcp, ormanaged-azure-kv). - Run
helm upgradeto apply the change.
Runner Shows as ERROR in the Collate UI
The runner can show as ERROR (shown as a red pill in the UI) when it cannot authenticate to Collate. A common cause is an incorrect IngestionBot JWT configured as config.authToken in your values.yaml.
Check the runner logs for confirmation:
config.authToken with the correct IngestionBot JWT token and restart the runner pod.
Runner Shows as Inactive in the Collate UI
-
Check that the
authTokeninvalues.yamlis the correct JWT from the IngestionBot. -
Verify outbound TLS (port 443) is allowed from your cluster to
<your-instance>.getcollate.io. -
Confirm the pod is running:
kubectl get pods. -
Check the Runner pod logs for connection or authentication errors:
ImagePullBackOff on the Runner Pod
The ECR credentials cron job may not have run yet. Trigger it manually:
Ingestion Pod Not Found — Diagnostics Unavailable
Issue: The ingestion job fails and the exit handler reports:Errored or OOMKilled state. An absent pod
or ContainerStatusUnknown state indicates the pod was removed externally,
typically by one of the following:
- Cluster autoscaling scaled down the node running the ingestion pod.
- A pod cleanup policy or TTL controller removed the pod.
- The node was rotated or replaced during the ingestion run.
-
Check the pod state immediately after the next failure:
-
Review cluster events around the time of failure:
-
Once identified, work with your infrastructure team to address the cause —
for example, configuring scale-down protection for ingestion workloads or
excluding the
argo-workflowsnamespace from pod cleanup policies.
If the pod is absent, it was removed by an external process before Argo’s configured TTL. Check your
ARGO_SECONDS_AFTER_COMPLETION_TTL setting to confirm the expected retention window.