Guide to Deploy Collate Binaries On-Premises
This guide will help you start using Collate Docker Images to run the OpenMetadata Application in an on-premises Kubernetes cluster, connecting with Argo Workflows for running ingestion from the OpenMetadata Application itself.Architecture
Collate OpenMetadata requires 4 components:- Collate Server
- Database — Collate Server stores the metadata in a relational database. We support MySQL or Postgres.
- MySQL version 8.0.0 or greater
- Postgres version 12.0 or greater
- Search Engine — We support:
- ElasticSearch 9.3.0
- OpenSearch 3.4
- Workflow Orchestration — We use Argo Workflows as the orchestrator for ingestion pipelines.
Sizing Requirements
Hardware Requirements
A Kubernetes Cluster with at least 1 Master Node and 3 Worker Nodes is the required configuration. Each Worker Node should have at least:- 4 vCPUs
- 16 GiB Memory
- 128 GiB Storage capacity
If you want Collate workloads scheduled on dedicated nodes, use Kubernetes taints and tolerations. Collate OpenMetadata supports
tolerations via custom Helm values.Software Requirements
- Collate OpenMetadata supports Kubernetes Cluster version 1.24 or greater.
- Collate Docker Images are available via private AWS Elastic Container Registry (ECR). The Collate Team will share credentials and steps to configure Kubernetes to pull Docker Images from AWS ECR.
- For Argo Workflows, Collate OpenMetadata is currently compatible with application version 3.4+.
Database Sizing and Capacity
Our recommendation is to configure PostgreSQL. For 100,000 Data Assets and 1,000 Users:- 8 vCPUs
- 64 GiB Memory
- 256 GiB Storage Capacity
- 3,500 IOPS storage
Search Client Sizing and Capacity
For 100,000 Data Assets and 1,000 Users:- 8 vCPUs
- 64 GiB Memory
- 256 GiB Storage Capacity
Argo Workflows Ingestion Runners
The recommended resources are 4 vCPUs and 16 GiB of Memory.On-Premises Prerequisites
Object Storage for Argo Workflows Artifacts
Argo Workflows requires object storage to archive ingestion logs. On-premises deployments can use MinIO as an S3-compatible object store.Deploy MinIO (if you don’t have an existing object store)
Create Kubernetes Secret for MinIO Credentials
Setup AWS ECR
Collate will provide the credentials to pull Docker Images from a private registry located in AWS ECR.Install AWS CLI
Follow the AWS CLI installation guide to install AWS CLI on your machine.Configure AWS Credentials
Kubernetes Docker Registry Secrets for AWS ECR
Replace
<<NAMESPACE_NAME>> with the namespace where you want to deploy Collate OpenMetadata Server. If the namespace does not exist yet, create it with kubectl create namespace <<NAMESPACE_NAME>>.Install Argo Workflows
Add Helm Repository
Create the Argo Namespace
Kubernetes Secret for Argo Workflows DB Credentials
Create Custom Helm Values for Argo Workflows
Create a file namedargo-workflows.values.yml:
If you are using an existing S3-compatible store (e.g., Ceph, NetApp StorageGRID) instead of MinIO, update
endpoint, bucket, and the secret reference to match your environment. Set insecure: false and configure TLS if your store uses HTTPS.Deploy Argo Workflows
We target application version 3.7.1 using Helm chart version 0.45.23 (Artifact Hub):[Optional] Enable Prometheus Metrics
If you have a Prometheus Application running on your cluster, enable metrics using:Install OpenMetadata/Collate
Create the Collate Namespace
Kubernetes Service Account for Ingestion
The OpenMetadata Application communicates with Argo Workflows to dynamically trigger ephemeral pods that run ingestion workloads. Create a dedicated Kubernetes Service Account:Create Long-Lived API Token for the ServiceAccount
Configure Kubernetes Roles for the Service Account
Create a fileom-argo-role.yml:
Install OpenMetadata Helm Chart
Create Kubernetes Secrets for the database connection:If you plan to use the DeltaLake connector, the
ARGO_INGESTION_IMAGE value should be:
118146679784.dkr.ecr.eu-west-1.amazonaws.com/collate-customers-ingestion-eu-west-1:om-1.12.3-cl-1.12.3openmetadata.values.yml:
[Optional] Enable Prometheus Metrics
Collate Application exposes Prometheus metrics on port8586. Enable the integration using:
Post Installation/Upgrade Steps
Configure ReIndexing
After installation or upgrade, configure ReIndexing from the OpenMetadata UI. For detailed steps, refer to the OpenMetadata upgrade documentation.Troubleshooting
Pods Stuck in Pending State
Check for resource constraints or missing secrets:| Symptom | Cause | Fix |
|---|---|---|
ImagePullBackOff | ECR secret missing or expired | Recreate ecr-registry-creds with a fresh ECR token |
Insufficient cpu / memory | Cluster at capacity | Reduce resources.requests in openmetadata.values.yml or add nodes |
Pending on PVC | No default StorageClass | Set a default StorageClass or pass explicit storageClass in values |
Argo Workflows Cannot Connect to Object Storage
Verify the MinIO service is reachable from theargo-workflows namespace:
200 OK. If it fails, check that MinIO is running and the endpoint in argo-workflows.values.yml is correct.
Environment Variables for Collate OpenMetadata Argo
| Environment Name | Description | Default Value | Required |
|---|---|---|---|
ARGO_IMAGE_PULL_SECRETS | Image Pull Secret Name to pull Docker Images for Ingestion from a Private Registry. Multiple secrets can be supplied comma-separated. | Empty String | False |
ARGO_INGESTION_IMAGE | Docker Image and Tag for Ingestion Images | openmetadata/ingestion-base:1.4.3 | True |
ARGO_NAMESPACE | Namespace in which Argo Workflows will be executed. Must match the namespace where OpenMetadata is deployed. | argo | True |
ARGO_SERVER_CERTIFICATE_PATH | SSL Certificate Path to connect to Argo Server | Empty String | False |
ARGO_TEST_CONNECTION_BACKOFF_TIME | Backoff retry time in seconds to test the connection | 5 | False |
ARGO_TOKEN | JWT Token to authenticate with Argo Workflow API | Empty String | True |
ARGO_WORKFLOW_CPU_LIMIT | Kubernetes CPU Limits for Argo Workflows created with Ingestion | 1000m | False |
ARGO_WORKFLOW_CPU_REQUEST | Kubernetes CPU Requests for Argo Workflows created with Ingestion | 200m | False |
ARGO_WORKFLOW_CUSTOMER_TOLERATION | Kubernetes Node Toleration to schedule Ingestion Workflow Pods to specific Nodes | argo | False |
ARGO_WORKFLOW_EXECUTOR_SERVICE_ACCOUNT_NAME | Service Account Name to be used for Argo Workflows for Ingestion | om-role | True |
ARGO_WORKFLOW_MEMORY_LIMIT | Kubernetes Memory Limits for Argo Workflows created with Ingestion | 4096Mi | False |
ARGO_WORKFLOW_MEMORY_REQUEST | Kubernetes Memory Requests for Argo Workflows created with Ingestion | 256Mi | False |
ASSET_UPLOADER_ENABLE | Enable Asset Upload Feature | True | False |
ASSET_UPLOADER_PROVIDER | Asset Upload Provider Name. Can be s3 or azure. | s3 | False |
ASSET_UPLOADER_MAX_FILE_SIZE | Max File Size to support for Asset Upload (in bytes) | 5242880 | False |
ASSET_UPLOADER_S3_ENDPOINT | Custom S3-compatible endpoint (e.g. MinIO) | Empty String | False |
ASSET_UPLOADER_S3_BUCKET_NAME | Asset Upload S3/MinIO Bucket Name | Empty String | False |
ASSET_UPLOADER_S3_PREFIX_PATH | Asset Upload S3/MinIO Prefix Path | assets/default | False |