Guide to Deploy Collate Binaries in Azure

This guide will help you start using Collate Docker Images to run the OpenMetadata Application in Kubernetes, connecting with Argo Workflows for running ingestion from the OpenMetadata Application itself.

Architecture

Collate OpenMetadata requires 4 components:

Collate Server
Database — Collate Server stores the metadata in a relational database. We support MySQL or Postgres. Any Cloud provider SaaS Database service (AWS RDS, GCP Cloud SQL, Azure SQL) will also work.
- MySQL version 8.0.42 or greater
- Postgres version 17.6 or greater
Search Engine — OpenSearch 3.4. ElasticSearch is not supported in Collate BYOC because Collate AI relies on OpenSearch’s vector capabilities for Semantic and Hybrid Search.
Workflow Orchestration — OpenMetadata requires connectors to be scheduled periodically to fetch metadata, or you can use the OpenMetadata APIs to push metadata directly. We will use Argo Workflows as the orchestrator here.

If your team prefers to run on any other orchestrator such as Prefect, Dagster, or even GitHub Workflows, please refer to our documentation on how the Ingestion Framework works.

Sizing Requirements

Hardware Requirements

A Kubernetes Cluster with at least 1 Master Node and 3 Worker Nodes is the required configuration. Master Nodes should run all Kubernetes essential workloads (kube-apiserver, kube-scheduler, kube-controller-manager, external DNS, Cluster Auto Scaling, external logging and monitoring). Each Worker Node should have at least:

4 vCPUs
16 GiB Memory
128 GiB Storage capacity

If you want to make sure the Collate workloads are scheduled over dedicated worker nodes, Kubernetes provides a way to schedule pods to the right nodes using taints and tolerations. Collate OpenMetadata also supports usage of tolerations as a way to let Kubernetes schedule pods on desired nodes using custom Helm values and application configurations.

Software Requirements

Collate OpenMetadata supports Kubernetes Cluster version 1.24 or greater.
Collate Docker Images are available via private AWS Elastic Container Registry (ECR). Collate Team will share the credentials as well as steps to configure Kubernetes to pull Docker Images from AWS ECR.
For Argo Workflows compatibility, Collate OpenMetadata is currently compatible with application version 3.4. View the compatibility matrix for details.

Recommended Cloud Instances

	Collate Server	Argo Workflows
AWS	t4g.large / m6a.large	m7i.large
Azure	b2as v2	b2s v2
GCP	t2a-standard-2 / t2d-standard-2	t2d-standard-2

Database Sizing and Capacity

Our recommendation is to configure Postgres as your database. For 100,000 Data Assets and 1,000 Users, the recommended sizing is:

8 vCPUs
64 GiB Memory
256 GiB Storage Capacity
3,500 IOPS storage

Known Issues with Azure Flexible Server (MySQL)The default value for the Azure MySQL Flexible Server system variable sql_generate_invisible_primary_key is ON. When enabled, the MySQL server automatically adds a generated invisible primary key (GIPK) to any InnoDB table created without an explicit primary key.For Collate with MySQL as the database, you need to turn OFF this configuration. See the Azure documentation for reference.

Search Client Sizing and Capacity

For 100,000 Data Assets and 1,000 Users, we recommend OpenSearch to be:

8 vCPUs
64 GiB Memory
256 GiB Storage Capacity

The best practice is to use a managed OpenSearch offering. The customer can also choose to run OpenSearch directly inside Kubernetes; in that case, the Collate team will not maintain the search service.

Argo Workflows Ingestion Runners

The recommended resources for Argo Workflows to run Collate ingestions are 4 vCPUs and 16 GiB of Memory. Ingestion workloads can be scheduled on spot instances to reduce cloud expenses.

Azure Prerequisites

Make sure AKS is enabled for OIDC Issuer and Workload Identity.

Enable AKS OIDC Issuer

Use the below command to check if OIDC Issuer is enabled for the AKS Cluster:

az aks show --resource-group <RESOURCE_GROUP> --name <CLUSTER_NAME> --query "oidcIssuerProfile.issuerUrl" -o tsv

If you need to enable the OIDC issuer, update your resource with:

az aks update --resource-group <RESOURCE_GROUP> --name <CLUSTER_NAME> --enable-oidc-issuer

Enable AKS Workload Identity

Enable a workload identity on existing AKS Cluster:

az aks update \
  --resource-group <RESOURCE_GROUP> \
  --name <CLUSTER_NAME> \
  --enable-workload-identity

[Optional] Terraform for Azure Prerequisites

Terraform code for setting up Azure prerequisites is available in the openmetadata-deployment GitHub repository. You can skip the manual Azure CLI steps below if provisioning via Terraform.

Storage Account and Blob Store for Argo Workflows Artifacts

Create a new Azure Storage Account:

az storage account create --name collate --resource-group <RESOURCE_GROUP> --location <LOCATION>

Create a blob container inside the storage account:

az storage container create --name argo-workflows --account-name collate

Create Azure User Managed Identities

Create 4 User Managed Identities (2 for Argo Workflows, 1 for Collate Server, 1 for Collate Ingestion):

# For Collate Server Application
az identity create --name "collate-application" --resource-group "${RESOURCE_GROUP}" --location "${LOCATION}" --subscription "${SUBSCRIPTION_ID}"

# For Collate Ingestion
az identity create --name "collate-ingestion" --resource-group "${RESOURCE_GROUP}" --location "${LOCATION}" --subscription "${SUBSCRIPTION_ID}"

# For Argo Workflows Server Pod
az identity create --name "argo-workflows-server" --resource-group "${RESOURCE_GROUP}" --location "${LOCATION}" --subscription "${SUBSCRIPTION_ID}"

# For Argo Workflows Controller Pod
az identity create --name "argo-workflows-controller" --resource-group "${RESOURCE_GROUP}" --location "${LOCATION}" --subscription "${SUBSCRIPTION_ID}"

Create the Federated Identity Credential

Create the federated identity credential between the managed identity, the service account issuer, and the subject:

# For Collate Server Application
az identity federated-credential create --name collate-application --identity-name "collate-application" --resource-group "${RESOURCE_GROUP}" --issuer "${AKS_OIDC_ISSUER}" --subject system:serviceaccount:"collate":"openmetadata" --audience api://AzureADTokenExchange

# For Collate Ingestion
az identity federated-credential create --name collate-application --identity-name "collate-ingestion" --resource-group "${RESOURCE_GROUP}" --issuer "${AKS_OIDC_ISSUER}" --subject system:serviceaccount:"collate":"om-role" --audience api://AzureADTokenExchange

# For Argo Workflows Server
az identity federated-credential create --name argo-workflows-server --identity-name "argo-workflows-server" --resource-group "${RESOURCE_GROUP}" --issuer "${AKS_OIDC_ISSUER}" --subject system:serviceaccount:"argo-workflows":"argo-workflows-server-sa" --audience api://AzureADTokenExchange

# For Argo Workflows Controller
az identity federated-credential create --name argo-workflows-controller --identity-name "argo-workflows-controller" --resource-group "${RESOURCE_GROUP}" --issuer "${AKS_OIDC_ISSUER}" --subject system:serviceaccount:"argo-workflows":"argo-workflows-controller-sa" --audience api://AzureADTokenExchange

Grant User Managed Identity Access to Storage Account

# For Collate Server Application
az role assignment create --assignee-object-id "${COLLATE_SERVER_APPLICATION_IDENTITY_PRINCIPAL_ID}" --role "Storage Blob Data Contributor" --scope "${AZURE_CONTAINER_ARTIFACT_ID}" --assignee-principal-type ServicePrincipal

# For Collate Ingestion
az role assignment create --assignee-object-id "${COLLATE_INGESTION_IDENTITY_PRINCIPAL_ID}" --role "Storage Blob Data Contributor" --scope "${AZURE_CONTAINER_ARTIFACT_ID}" --assignee-principal-type ServicePrincipal

# For Argo Workflows Server
az role assignment create --assignee-object-id "${ARGO_WORKFLOWS_SERVER_IDENTITY_PRINCIPAL_ID}" --role "Storage Blob Data Reader" --scope "${AZURE_CONTAINER_ARTIFACT_ID}" --assignee-principal-type ServicePrincipal

# For Argo Workflows Controller
az role assignment create --assignee-object-id "${ARGO_WORKFLOWS_CONTROLLER_IDENTITY_PRINCIPAL_ID}" --role "Storage Blob Data Reader" --scope "${AZURE_CONTAINER_ARTIFACT_ID}" --assignee-principal-type ServicePrincipal

Setup AWS ECR

Collate will provide the credentials to pull Docker Images from a private registry located in AWS ECR.

Install AWS CLI

Follow the AWS CLI installation guide to install AWS CLI on your machine. This is required to connect to AWS ECR and configure Kubernetes Docker Registry Secrets.

Configure AWS Credentials

Run the following command to configure AWS CLI:

aws configure --profile ecr-collate

The command will prompt for credentials. The Collate team will securely share these via a 1Password link. Confirm the AWS credentials are correctly set:

aws configure --profile ecr-collate

Kubernetes Docker Registry Secrets for AWS ECR

Create a Docker Registry Kubernetes secret to pull images from AWS ECR:

kubectl create secret docker-registry ecr-registry-creds \
  --docker-server=118146679784.dkr.ecr.eu-west-1.amazonaws.com \
  --docker-username=AWS \
  --docker-password=$(aws ecr get-login-password --profile ecr-collate) \
  --namespace <<NAMESPACE_NAME>>

Replace <<NAMESPACE_NAME>> with the namespace where you want to deploy Collate OpenMetadata Server. If the namespace does not exist yet, create it with kubectl create namespace <<NAMESPACE_NAME>>.

AWS ECR Token RefreshECR will reject stale tokens obtained more than 12 hours ago. If a pod is moved to another node after 12 hours, you will get an ImagePullBackOff error. In such cases, delete the secret and recreate it using the command above.

Install Argo Workflows

We will use the official community-maintained Helm Chart of Argo Workflows.

Add Helm Repository

helm repo add argo https://argoproj.github.io/argo-helm

Create a Kubernetes Namespace

kubectl create namespace argo-workflows

Kubernetes Secret for Argo Workflows DB Credentials

kubectl create secret generic argo-db-credentials \
  --from-literal=username=<DB_USERNAME> \
  --from-literal=password=<DB_PASSWORD> \
  --namespace argo-workflows

Create Custom Helm Values for Argo Workflows

Create a file named argo-workflows.values.yml:

# argo-workflows.values.yml
controller:
  resources:
    requests:
      memory: "2Gi"
      cpu: "1"
    limits:
      memory: "4Gi"
      cpu: "2"
  serviceAccount:
    create: true
    name: argo-workflows-controller-sa
    annotations:
      azure.workload.identity/client-id: "<ARGO_WORKFLOWS_CONTROLLER_AZURERM_USER_IDENTITY_CLIENT_ID>"
  podLabels:
    azure.workload.identity/use: "true"
  name: workflow-controller
  workflowDefaults:
    spec:
      podMetadata:
        labels:
          azure.workload.identity/use: "true"
  persistence:
    archive: true
    postgresql:
      host: <DATABASE_INSTANCE_ENDPOINT>
      database: <DATABASE_NAME>
      tableName: argo_workflows
      userNameSecret:
        name: argo-db-credentials
        key: username
      passwordSecret:
        name: argo-db-credentials
        key: password
      ssl: true
      sslMode: require
server:
  resources:
    requests:
      memory: "2Gi"
      cpu: "1"
    limits:
      memory: "4Gi"
      cpu: "2"
  serviceAccount:
    create: true
    name: argo-workflows-server-sa
    annotations:
      azure.workload.identity/client-id: "<ARGO_WORKFLOWS_SERVER_AZURERM_USER_IDENTITY_CLIENT_ID>"
  podLabels:
    azure.workload.identity/use: "true"
  extraArgs:
  - "--auth-mode=server"
  - "--request-timeout=5m"

useDefaultArtifactRepo: true
useStaticCredentials: false
artifactRepository:
  archiveLogs: true
  azure:
    endpoint: <AZURE_STORAGE_ACCOUNT_ENDPOINT>
    container: <AZURE_STORAGE_ACCOUNT_CONTAINER_ARTIFACT_NAME>
    blobNameFormat: 'workflows/{{workflow.namespace}}/{{workflow.name}}/{{pod.name}}'
    useSDKCreds: true

For further customisation, refer to the community Helm chart values.

Deploy Argo Workflows

We target application version 3.7.1 using Helm chart version 0.45.23 (Artifact Hub):

helm upgrade --install argo-workflows argo/argo-workflows \
  --version 0.45.23 \
  --namespace argo-workflows \
  --values argo-workflows.values.yml

[Optional] Enable Prometheus Metrics

If you have a Prometheus Application running on your cluster, enable Argo Workflows metrics using:

controller:
  serviceMonitor:
    enabled: true
server:
  serviceMonitor:
    enabled: true

Please refer to the official Argo Workflows documentation for further metric configuration options.

Setup Azure Container Registry

Collate will provide credentials to pull Docker Images from the private registry in AWS ECR.

Install Azure CLI

Follow the Azure CLI installation guide to install AZ CLI on your machine.

Configure Azure Credentials

Create a Service Principal and log in:

az ad sp create-for-rbac --name <SERVICE_PRINCIPAL_NAME> --role <ROLE> --scopes /subscriptions/<SUBSCRIPTION_ID>

az login --service-principal --username <APP_ID> --password <PASSWORD> --tenant <TENANT_ID>

The Collate team will securely share the required credentials via a 1Password link. Confirm the Azure credentials are correctly set:

az account show

Install OpenMetadata/Collate

Create a Kubernetes Namespace

kubectl create namespace collate

Kubernetes Service Account for Ingestion

The OpenMetadata Application communicates with Argo Workflows to dynamically trigger ephemeral pods that run ingestion workloads. Create a dedicated Kubernetes Service Account in the same namespace:

kubectl create serviceaccount om-role -n collate

Label and Annotate the Service Account for Azure Managed Identity

kubectl annotate serviceaccount -n collate om-role \
  azure.workload.identity/client-id=<<COLLATE_APPLICATION_MANAGED_IDENTITY_CLIENT_ID>>

kubectl label serviceaccount -n collate om-role azure.workload.identity/use=true

Replace <<COLLATE_APPLICATION_MANAGED_IDENTITY_CLIENT_ID>> with the Azure User Managed Identity Client ID for Collate OpenMetadata Server.

Create Long-Lived API Token for the ServiceAccount

kubectl apply -n collate -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: om-role.service-account-token
  annotations:
    kubernetes.io/service-account.name: om-role
type: kubernetes.io/service-account-token
EOF

Configure Kubernetes Roles for the Service Account

Create a file om-argo-role.yml:

# om-argo-role.yml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: om-argo-role
  namespace: collate
rules:
  - verbs: [list, watch, create, update, patch, get, delete]
    apiGroups:
      - argoproj.io
    resources:
      - workflows
  - verbs: [list, watch, patch, get]
    apiGroups:
      - ''
    resources:
      - pods/log
      - pods
  - verbs: [list, watch, create, update, patch, get, delete]
    apiGroups:
      - argoproj.io
    resources:
      - cronworkflows
  - verbs: [create, patch]
    apiGroups:
      - argoproj.io
    resources:
      - workflowtaskresults

Apply the role and create the role binding:

kubectl apply -f om-argo-role.yml

kubectl create rolebinding om-argo-role-binding \
  --role=om-argo-role \
  --serviceaccount=collate:om-role --namespace <<NAMESPACE_NAME>>

Install OpenMetadata Helm Chart

Create Kubernetes Secrets for the database connection:

kubectl create secret generic mysql-secrets \
  --from-literal=openmetadata-mysql-password=<<DATABASE_PASSWORD>> \
  --namespace collate

Replace <<DATABASE_PASSWORD>> with the password for your Azure SQL Database for Collate OpenMetadata Server.

Create a file openmetadata.values.yml:

If you plan to use the DeltaLake connector, the ARGO_INGESTION_IMAGE value should be: 118146679784.dkr.ecr.eu-west-1.amazonaws.com/collate-customers-ingestion-eu-west-1:om-1.12.8-cl-1.12.8

# openmetadata.values.yml
replicaCount: 1
resources:
  limits:
    cpu: 3000m
    memory: 12Gi
  requests:
    cpu: 1000m
    memory: 10Gi
openmetadata:
  config:
    elasticsearch:
      host: ${es_host}
      port: ${es_port}
      scheme: ${es_scheme}
      searchType: opensearch
      auth:
        enabled: true
        username: ${es_username}
        password:
          secretRef: es-credentials
          secretKey: password
    database:
      host: ${db_host}
      port: ${db_port}
      driverClass: org.postgresql.Driver
      dbScheme: postgresql
      auth:
        username: ${db_user}
        password:
          secretRef: db-credentials
          secretKey: password
      dbParams: "allowPublicKeyRetrieval=true&useSSL=true&serverTimezone=UTC"
    pipelineServiceClientConfig:
      enabled: true
      type: "argoWorkflows"
      metadataApiEndpoint: "http://openmetadata:8585/api"
      argoWorkflows:
        namespace: collate
        serviceAccountName: om-role
        ingestionImage: "118146679784.dkr.ecr.eu-west-1.amazonaws.com/collate-customers-ingestion-slim-eu-west-1:om-1.12.8-cl-1.12.8"
        imagePullPolicy: "IfNotPresent"
        imagePullSecrets: "ecr-registry-creds"
        apiEndpoint: "http://argo-workflows-server.argo-workflows:2746"
image:
  repository: 118146679784.dkr.ecr.eu-west-1.amazonaws.com/collate-customers-eu-west-1
  tag: om-1.12.3-cl-1.12.3
  imagePullPolicy: IfNotPresent
imagePullSecrets:
- name: ecr-registry-creds
extraEnvs:
- name: OPENMETADATA_HEAP_OPTS
  value: "-Xmx8G -Xms8G"
- name: ARGO_TOKEN
  valueFrom:
    secretKeyRef:
      name: "om-role.service-account-token"
      key: "token"
- name: ASSET_UPLOADER_PROVIDER
  value: "azure"
- name: ASSET_UPLOADER_MAX_FILE_SIZE
  value: "10485760"
- name: ASSET_UPLOADER_AZURE_CONTAINER_NAME
  value: "<AZURE_STORAGE_ACCOUNT_CONTAINER_NAME>"
- name: ASSET_UPLOADER_AZURE_BLOB_ENDPOINT
  value: "https://<AZURE_STORAGE_ACCOUNT_NAME>.blob.core.windows.net"
- name: ASSET_UPLOADER_AZURE_PREFIX_PATH
  value: "assets/collate"
serviceAccount:
  name: "openmetadata"
  annotations:
    azure.workload.identity/client-id: <COLLATE_SERVER_APPLICATION_IDENTITY_CLIENT_ID>
commonLabels:
  azure.workload.identity/use: "true"

Install the Collate OpenMetadata Application:

helm upgrade --install openmetadata open-metadata/openmetadata \
  --values openmetadata.values.yml \
  --namespace collate

[Optional] Enable Prometheus Metrics

Collate Application exposes Prometheus metrics on port 8586. Enable the integration using:

serviceMonitor:
  enabled: true

For more configurations, refer to the Helm chart values.

Post Installation/Upgrade Steps

Configure ReIndexing

After installation or upgrade, configure ReIndexing from the OpenMetadata UI. For detailed steps, refer to the OpenMetadata upgrade documentation.

Environment Variables for Collate OpenMetadata Argo

Environment Name	Description	Default Value	Required
`ARGO_SERVER_CERTIFICATE_PATH`	SSL Certificate Path to connect to Argo Server	Empty String	False
`ARGO_TEST_CONNECTION_BACKOFF_TIME`	Backoff retry time in seconds to test the connection	`5`	False
`ARGO_TOKEN`	JWT Token to authenticate with Argo Workflow API	Empty String	True
`ARGO_WORKFLOW_CPU_LIMIT`	Kubernetes CPU Limits for Argo Workflows created with Ingestion	`1000m`	False
`ARGO_WORKFLOW_CPU_REQUEST`	Kubernetes CPU Requests for Argo Workflows created with Ingestion	`200m`	False
`ARGO_WORKFLOW_CUSTOMER_TOLERATION`	Kubernetes Node Toleration to schedule Ingestion Workflow Pods to specific Nodes	`argo`	False
`ARGO_WORKFLOW_EXECUTOR_SERVICE_ACCOUNT_NAME`	Service Account Name to be used for Argo Workflows for Ingestion	`om-role`	True
`ARGO_WORKFLOW_MEMORY_LIMIT`	Kubernetes Memory Limits for Argo Workflows created with Ingestion	`4096Mi`	False
`ARGO_WORKFLOW_MEMORY_REQUEST`	Kubernetes Memory Requests for Argo Workflows created with Ingestion	`256Mi`	False
`ASSET_UPLOADER_ENABLE`	Enable Asset Upload Feature	`True`	False
`ASSET_UPLOADER_PROVIDER`	Asset Upload Provider Name. Can be `s3` or `azure`.	`s3`	False
`ASSET_UPLOADER_MAX_FILE_SIZE`	Max File Size to support for Asset Upload (in bytes)	`5242880`	False
`ASSET_UPLOADER_AZURE_CONTAINER_NAME`	Asset Upload Azure Container Name	`my-container`	False
`ASSET_UPLOADER_AZURE_CONNECTION_STRING`	Asset Upload Azure Account Connection String	Empty String	False
`ASSET_UPLOADER_AZURE_CLIENT_ID`	Asset Upload Azure Client ID	`clientId`	False
`ASSET_UPLOADER_AZURE_TENANT_ID`	Asset Upload Azure Tenant ID	`tenantId`	False
`ASSET_UPLOADER_AZURE_CLIENT_SECRET`	Asset Upload Azure Client Secret	`clientsecret`	False
`ASSET_UPLOADER_AZURE_BLOB_ENDPOINT`	Asset Upload Azure Storage Account Blob Endpoint	Empty String	False
`ASSET_UPLOADER_AZURE_PREFIX_PATH`	Asset Upload Azure Prefix Path	`assets/default`	False

Appendix: List of AWS ECR Public IPs

If your company policy blocks access to external resources, ensure the public IPs of AWS ECR are reachable from your cluster.

Using Terraform

data "aws_ip_ranges" "ip_ranges" {
  regions  = ["eu-west-1"]
  services = ["amazon"]
}

output "ireland_ip_ranges" {
  value = data.aws_ip_ranges.ip_ranges.cidr_blocks
}

Using curl and jq

curl -s https://ip-ranges.amazonaws.com/ip-ranges.json | jq '.prefixes[] | select(.region=="eu-west-1") | select(.service=="AMAZON")'

After running one of the above commands, you will see a list of IP ranges from Amazon.

Documentation Index

​Guide to Deploy Collate Binaries in Azure

​Architecture

​Sizing Requirements

​Hardware Requirements

​Software Requirements

​Recommended Cloud Instances

​Database Sizing and Capacity

​Search Client Sizing and Capacity

​Argo Workflows Ingestion Runners

​Azure Prerequisites

​Enable AKS OIDC Issuer

​Enable AKS Workload Identity

​[Optional] Terraform for Azure Prerequisites

​Storage Account and Blob Store for Argo Workflows Artifacts

​Create Azure User Managed Identities

​Create the Federated Identity Credential

​Grant User Managed Identity Access to Storage Account

​Setup AWS ECR

​Install AWS CLI

​Configure AWS Credentials

​Kubernetes Docker Registry Secrets for AWS ECR

​Install Argo Workflows

​Add Helm Repository

​Create a Kubernetes Namespace

​Kubernetes Secret for Argo Workflows DB Credentials

​Create Custom Helm Values for Argo Workflows

​Deploy Argo Workflows

​[Optional] Enable Prometheus Metrics

​Setup Azure Container Registry

​Install Azure CLI

​Configure Azure Credentials

​Install OpenMetadata/Collate

​Create a Kubernetes Namespace

​Kubernetes Service Account for Ingestion

​Label and Annotate the Service Account for Azure Managed Identity

​Create Long-Lived API Token for the ServiceAccount

​Configure Kubernetes Roles for the Service Account

​Install OpenMetadata Helm Chart

​[Optional] Enable Prometheus Metrics

​Post Installation/Upgrade Steps

​Configure ReIndexing

​Environment Variables for Collate OpenMetadata Argo

​Appendix: List of AWS ECR Public IPs

​Using Terraform

​Using curl and jq

Guide to Deploy Collate Binaries in Azure

Architecture

Sizing Requirements

Hardware Requirements

Software Requirements

Recommended Cloud Instances

Database Sizing and Capacity

Search Client Sizing and Capacity

Argo Workflows Ingestion Runners

Azure Prerequisites

Enable AKS OIDC Issuer

Enable AKS Workload Identity

[Optional] Terraform for Azure Prerequisites

Storage Account and Blob Store for Argo Workflows Artifacts

Create Azure User Managed Identities

Create the Federated Identity Credential

Grant User Managed Identity Access to Storage Account

Setup AWS ECR

Install AWS CLI

Configure AWS Credentials

Kubernetes Docker Registry Secrets for AWS ECR

Install Argo Workflows

Add Helm Repository

Create a Kubernetes Namespace

Kubernetes Secret for Argo Workflows DB Credentials

Create Custom Helm Values for Argo Workflows

Deploy Argo Workflows

[Optional] Enable Prometheus Metrics

Setup Azure Container Registry

Install Azure CLI

Configure Azure Credentials

Install OpenMetadata/Collate

Create a Kubernetes Namespace

Kubernetes Service Account for Ingestion

Label and Annotate the Service Account for Azure Managed Identity

Create Long-Lived API Token for the ServiceAccount

Configure Kubernetes Roles for the Service Account

Install OpenMetadata Helm Chart

[Optional] Enable Prometheus Metrics

Post Installation/Upgrade Steps

Configure ReIndexing

Environment Variables for Collate OpenMetadata Argo

Appendix: List of AWS ECR Public IPs

Using Terraform

Using curl and jq