Skip to main content

Kubernetes Deployment

Deploy Collate on Kubernetes for production-grade scalability and reliability.

Prerequisites

  • Kubernetes cluster (1.19+)
  • kubectl configured
  • Helm 3.x installed
  • Minimum 8 CPU cores and 16GB RAM

Quick Installation

Add Helm Repository

helm repo add open-metadata https://helm.open-metadata.org/
helm repo update

Install with Default Values

helm install collate open-metadata/collate

Install with Custom Values

# Download values file
curl -sL https://raw.githubusercontent.com/open-metadata/collate-helm-charts/main/charts/collate/values.yaml > values.yaml

# Edit values.yaml as needed
vim values.yaml

# Install
helm install collate open-metadata/collate -f values.yaml

Configuration

Basic Configuration

# values.yaml
collate:
  config:
    database:
      host: mysql.default.svc.cluster.local
      port: 3306
      databaseName: collate_db
      username: collate_user
      password: secure_password

    elasticsearch:
      host: elasticsearch.default.svc.cluster.local
      port: 9200

    airflow:
      host: http://airflow.default.svc.cluster.local:8080
      username: admin
      password: admin

High Availability Setup

# High availability configuration
replicaCount: 3

resources:
  requests:
    memory: "4Gi"
    cpu: "2"
  limits:
    memory: "8Gi"
    cpu: "4"

# Database HA
postgresql:
  enabled: true
  auth:
    database: collate_db
    username: collate_user
    password: secure_password
  primary:
    persistence:
      size: 100Gi
  readReplicas:
    replicaCount: 2

# Elasticsearch HA
elasticsearch:
  enabled: true
  replicas: 3
  minimumMasterNodes: 2
  resources:
    requests:
      memory: "2Gi"
      cpu: "1"
    limits:
      memory: "4Gi"
      cpu: "2"

Storage Configuration

Persistent Volumes

# Storage configuration
persistence:
  enabled: true
  storageClass: "fast-ssd"
  size: 20Gi

# Database persistence
postgresql:
  primary:
    persistence:
      enabled: true
      storageClass: "fast-ssd"
      size: 100Gi

# Elasticsearch persistence
elasticsearch:
  persistence:
    enabled: true
    storageClass: "fast-ssd"
    size: 50Gi

Ingress Configuration

# Ingress for external access
ingress:
  enabled: true
  className: "nginx"
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
  hosts:
    - host: collate.company.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: collate-tls
      hosts:
        - collate.company.com

Monitoring and Observability

Prometheus Integration

# Enable Prometheus metrics
metrics:
  enabled: true
  port: 9464
  path: /metrics

serviceMonitor:
  enabled: true
  labels:
    prometheus: kube-prometheus

Health Checks

# Health check configuration
livenessProbe:
  httpGet:
    path: /health
    port: 8585
  initialDelaySeconds: 60
  periodSeconds: 30

readinessProbe:
  httpGet:
    path: /health
    port: 8585
  initialDelaySeconds: 60
  periodSeconds: 30

Troubleshooting

Check pod logs:
kubectl logs -f deployment/collate
kubectl describe pod <pod-name>
Verify database connectivity:
kubectl exec -it deployment/collate -- nc -zv mysql.default.svc.cluster.local 3306
Check ingress status:
kubectl get ingress
kubectl describe ingress collate