> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getcollate.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Azure AKS Deployment | Official Documentation

> Configure deployment with Azure Kubernetes Service (AKS) using Collate-recommended Helm charts and scalable configuration templates.

# OpenMetadata Deployment on Azure Kubernetes Service

OpenMetadata can be deployed on Azure Kubernetes Service. It, however, requires certain cloud-specific configurations to set up storage accounts for Airflow, which is one of its dependencies.

## Prerequisites

### Azure Services for Database and Search Engine as Elastic Cloud

It is recommended to use either [Azure Database for MySQL](https://azure.microsoft.com/en-in/products/mysql) or [Azure Database for PostgreSQL](https://azure.microsoft.com/en-in/products/postgresql) for the database, and [Elastic Cloud on Azure](https://www.elastic.co/partners/microsoft-azure) for Production Deployments.

We support

* Azure Database for MySQL engine version 8.0.42 or higher
* Azure Database for PostgreSQL engine version 17.6 or higher
* Elastic Cloud (ElasticSearch version 9.x, minimum 9.0.0)

Once you have Azure Database for MySQL or Azure Database for PostgreSQL and Elastic Cloud on Azure configured, you can update the environment variables below for OpenMetadata kubernetes deployments to connect with Database and ElasticSearch.

```yaml theme={null}
# openmetadata-values.prod.yaml
...
openmetadata:
  config:
    elasticsearch:
      host: <ELASTIC_CLOUD_ENDPOINT_WITHOUT_HTTPS>
      searchType: elasticsearch
      port: 443
      scheme: https
      connectionTimeoutSecs: 5
      socketTimeoutSecs: 60
      keepAliveTimeoutSecs: 600
      batchSize: 10
      auth:
        enabled: true
        username: <ELASTIC_CLOUD_USERNAME>
        password:
          secretRef: elasticsearch-secrets
          secretKey: openmetadata-elasticsearch-password
    database:
      host: <AZURE_SQL_ENDPOINT>
      port: 3306
      driverClass: com.mysql.cj.jdbc.Driver
      dbScheme: mysql
      dbUseSSL: true
      databaseName: <AZURE_SQL_DATABASE_NAME>
      auth:
        username: <AZURE_SQL_DATABASE_USERNAME>
        password:
          secretRef: mysql-secrets
          secretKey: openmetadata-mysql-password
  ...
```

We recommend -

* Azure Database for MySQL or Azure Database for PostgreSQL to be Multi Zone Available and Production Workload Environment
* Elastic Cloud Environment with multiple zones and minimum 2 nodes

Make sure to create database and elastic cloud credentials as Kubernetes Secrets mentioned [here](/quick-start/local-kubernetes-deployment#2.-create-kubernetes-secrets-required-for-helm-charts).

Also, disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts as mentioned in the FAQs [here](#how-to-disable-mysql-and-elasticsearch-from-openmetadata-dependencies-helm-charts).

### Step 1 - Create an AKS cluster

If you are deploying on a new cluster set the `EnableAzureDiskFileCSIDriver=true` to enable container storage interface storage drivers.

```azure-cli theme={null}
az aks create   --resource-group  MyResourceGroup    \
                --name MyAKSClusterName              \
                --nodepool-name agentpool            \
                --outbound-type loadbalancer         \
                --location YourPreferredLocation        \
                --generate-ssh-keys                  \
		        --enable-addons monitoring           \
		          EnableAzureDiskFileCSIDriver=true  \

```

For existing cluster it is important to enable the CSI storage drivers

```azure-cli theme={null}
az aks update -n MyAKSCluster -g MyResourceGroup --enable-disk-driver --enable-file-driver
```

### Step 2 - Create a Namespace (optional)

```azure-cli theme={null}
kubectl create namespace openmetadata
```

### Step 3 - Create Persistent Volumes

OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes). The Azure CSI storage drivers we enabled earlier support the provisioning of the disks in ReadWriteMany mode,.

```yaml theme={null}
# logs_dags_pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: openmetadata-dependencies-dags-pvc
  namespace: openmetadata
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: azurefile-csi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: openmetadata-dependencies-logs-pvc
  namespace: openmetadata
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: azurefile-csi
```

Create the volume claims by applying the manifest.

```azure-cli theme={null}
kubectl apply -f logs_dags_pvc.yaml
```

### Step 4 - Change owner and update permission for persistent volumes

Airflow pods run as non-root user and lack write access to our persistent volumes. To fix this we create a job permissions\_pod.yaml that runs a pod that mounts volumes into the persistent volume claim and updates the owner of the mounted folders /airflow-dags and /airflow-logs to user id 5000, which is the default linux user id of Airflow pods.

```yaml theme={null}
# permissions_pod.yaml
apiVersion: batch/v1
kind: Job
metadata:
  labels:
    run: my-permission-pod
  name: my-permission-pod
  namespace: openmetadata
spec:
  template:
    spec:
      containers:
      - image: busybox
        name: my-permission-pod
        volumeMounts:
        - name: airflow-dags
          mountPath: /airflow-dags
        - name: airflow-logs
          mountPath: /airflow-logs
        command: ["/bin/sh", "-c", "chown -R 50000 /airflow-dags /airflow-logs", "chmod -R a+rwx /airflow-dags"]
      restartPolicy: Never
      volumes:
      - name: airflow-logs
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-logs-pvc
      - name: airflow-dags
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-dags-pvc
```

Start the job by applying the manifest in permissions\_pod.yaml.

```azure-cli theme={null}
kubectl apply -f permissions_pod.yaml
```

### Step 5 - Add the Helm OpenMetadata repo and set up secrets

#### Add Helm Repo

```azure-cli theme={null}
helm repo add open-metadata https://helm.open-metadata.org/
```

#### Create secrets

It is recommended to use external database and search for production deployments. The following implementation uses external postgresql DB from Azure Database. Any of the popular databases can be used. The default implementation uses mysql.

```azure-cli theme={null}
kubectl create secret generic airflow-secrets                                    \
                    --namespace openmetadata                                     \
                    --from-literal=openmetadata-airflow-password=<AdminPassword>
```

For production deployments connecting external postgresql database provide external database connection details by settings up appropriate secrets as below to use in manifests.

```azure-cli theme={null}
kubectl create secret generic postgresql-secret                                       \
                                --namespace openmetadata                              \
                                --from-literal=postgresql-password=<MyPGDBPassword>

```

### Step 6 - Install OpenMetadata dependencies

The `values-dependencies.yaml` file is used to override the default values in the official Helm chart and should be configured for your deployment needs. Uncomment the `externalDatabase` section with meaningful values to connect to an external database for production deployments. We set sensitive information like the host address, database name, and database username through the CLI.

```yaml theme={null}
# values-dependencies.yaml

airflow:
  airflow:
    extraVolumeMounts:
      - mountPath: /airflow-logs
        name: aks-airflow-logs
      - mountPath: /airflow-dags/dags
        name: aks-airflow-dags
    extraVolumes:
      - name: aks-airflow-logs
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-logs-pvc
      - name: aks-airflow-dags
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-dags-pvc
    config:
      AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/airflow-dags/dags"
  dags:
    path: /airflow-dags/dags
    persistence:
      enabled: false
  logs:
    path: /airflow-logs
    persistence:
      enabled: false
  externalDatabase:
    type: postgres # default mysql
    host: Host_db_address
    database: Airflow_metastore_dbname
    user: db_userName
    port: 5432
    dbUseSSL: true
    passwordSecret: postgresql-secret
    passwordSecretKey: postgresql-password

```

We overwrite some of the default values in the official openmetadata-dependencies helm chart with the values-dependencies.yaml to include an external postgresql db. And it's important to turn the mysql.enable flag to false if you are not using the default mysql db. This can be done both through the yaml file or as shown by setting variable values in the helm install command.

For more information on airflow helm chart values, please refer to [airflow-helm](https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.3)

```azure-cli theme={null}
helm install openmetadata-dependencies open-metadata/openmetadata-dependencies  \
                            --values values-dependencies.yaml                           \
                            --namespace openmetadata                                    \
                            --set mysql.enabled=false
```

It takes a few minutes for all the pods to be correctly set-up and running.

```azure-cli theme={null}
kubectl get pods -n openmetadata
```

```
NAME                                                       READY   STATUS    RESTARTS   AGE
openmetadata-dependencies-db-migrations-69fcf8c9d9-ctd2f   1/1     Running   0          4m51s
openmetadata-dependencies-pgbouncer-d9476f85-bwht9         1/1     Running   0          4m54s
openmetadata-dependencies-scheduler-5f785954cb-792ls       1/1     Running   0          4m54s
openmetadata-dependencies-sync-users-b58ccc589-ncb2d       1/1     Running   0          4m47s
openmetadata-dependencies-triggerer-684b8bb998-mbzvs       1/1     Running   0          4m53s
openmetadata-dependencies-web-9f6b4ff-5hfqj                1/1     Running   0          4m53s
opensearch-0                                               1/1     Running   0          42m

```

### Step 7 - Install OpenMetadata

Finally install OpenMetadata optionally customizing the values provided in the official chart [here](https://github.com/open-metadata/openmetadata-helm-charts/blob/main/charts/openmetadata/values.yaml) using the values.yaml file.

```yaml theme={null}
# values.yaml

global:
  pipelineServiceClientConfig:
    apiEndpoint: http://openmetadata-dependencies-web.<replace_with_your_namespace>.svc.cluster.local:8080
    metadataApiEndpoint: http://openmetadata.<replace_with_your_namespace>.svc.cluster.local:8585/api

openmetadata:
  config:
    database:
      host: postgresql
      port: 5432
      driverClass: org.postgresql.Driver
      dbScheme: postgresql
      databaseName: openmetadata_db
      auth:
        username:
        password:
          secretRef: postgresql-secret  # referring to secret set in step 5 above
          secretKey: postgresql-password

image:
  tag: <image-tag>
```

```azure-cli theme={null}
helm install openmetadata open-metadata/openmetadata    \
                            --values values.yaml        \
                            --namespace openmetadata
```

Give it again a few seconds for the pod to get ready. And when its ready, the service can be accessed by forwarding port 8585 of the cluster ip to you local host port.

```azure-cli theme={null}
kubectl port-forward service/openmetadata 8585:8585 -n openmetadata
```

## Troubleshooting Airflow

### JSONDecodeError: Unterminated string starting

If you are using Airflow with Azure Blob Storage as `PersistentVolume` as explained in [Storage class using blobfuse](https://learn.microsoft.com/en-us/azure/aks/azure-csi-blob-storage-provision?tabs=mount-nfs%2Csecret),
you may encounter the following error after a few days:

```bash theme={null}
{dagbag.py:346} ERROR - Failed to import: /airflow-dags/dags/...py
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 3552
```

Moreover, the Executor pods would actually be using old files. This behaviour is caused by the recommended config by the
mentioned documentation:

```yaml theme={null}
  - -o allow_other
  - --file-cache-timeout-in-seconds=120
  - --use-attr-cache=true
  - --cancel-list-on-mount-seconds=10  # prevent billing charges on mounting
  - -o attr_timeout=120
  - -o entry_timeout=120
  - -o negative_timeout=120
  - --log-level=LOG_WARNING  # LOG_WARNING, LOG_INFO, LOG_DEBUG
  - --cache-size-mb=1000  # Default will be 80% of available memory, eviction will happen beyond that.
```

**Disabling the cache** will help here. In this case it won't have any negative impact, since the `.py` and `.json`
files are small enough and not heavily used.

The same configuration without cache:

```yaml theme={null}
  - --o direct_io
  - --file-cache-timeout-in-seconds=0
  - --use-attr-cache=false
  - --cancel-list-on-mount-seconds=10
  - --o attr_timeout=0
  - --o entry_timeout=0
  - --o negative_timeout=0
  - --log-level=LOG_WARNING
  - --cache-size-mb=0
```

You can find more information about this error [here](https://github.com/open-metadata/OpenMetadata/issues/15321), and similar
discussions [here](https://github.com/Azure/azure-storage-fuse/issues/1171) and [here](https://github.com/Azure/azure-storage-fuse/issues/1139).

# FAQs

## Java Memory Heap Issue

If your openmetadata pods are not in ready state at any point in time and the openmetadata pod logs speaks about the below issue -

```
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "AsyncAppender-Worker-async-file-appender"
Exception in thread "pool-5-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-file-appender" java.lang.OutOfMemoryError: Java heap space
Exception in thread "dw-46" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-console-appender" java.lang.OutOfMemoryError: Java heap space
```

This is due to the default JVM Heap Space configuration (1 GiB) being not enough for your workloads. In order to resolve this issue, head over to your custom openmetadata helm values and append the below environment variable

```yaml theme={null}
extraEnvs:
- name: OPENMETADATA_HEAP_OPTS
  value: "-Xmx2G -Xms2G"
```

The flag `Xmx` specifies the maximum memory allocation pool for a Java virtual machine (JVM), while `Xms` specifies the initial memory allocation pool.

Upgrade the helm charts with the above changes using the following command `helm upgrade --install openmetadata open-metadata/openmetadata --values <values.yml> --namespace <namespaceName>`. Update this command your `values.yml` filename and `namespaceName` where you have deployed Collate in Kubernetes.

## PostgreSQL Issue permission denied to create extension "pgcrypto"

If you are facing the below issue with PostgreSQL as Database Backend for OpenMetadata Application,

```
Message: ERROR: permission denied to create extension "pgcrypto"
Hint: Must be superuser to create this extension.
```

It seems the Database User does not have sufficient privileges. In order to resolve the above issue, grant usage permissions to the PSQL User.

```sql theme={null}
GRANT USAGE ON SCHEMA schema_name TO <openmetadata_psql_user>;
GRANT CREATE ON EXTENSION pgcrypto TO <openmetadata_psql_user>;
```

<Tip>
  In the above command, replace `<openmetadata_psql_user>` with the sql user used by OpenMetadata Application to connect to PostgreSQL Database.
</Tip>

## How to extend and use custom docker images with Collate Helm Charts ?

## Extending Collate Server Docker Image

### 1. Create a `Dockerfile` based on `docker.getcollate.io/openmetadata/server`

Collate helm charts uses official published docker images from [DockerHub](https://hub.docker.com/u/openmetadata).
A typical scenario will be to install organization certificates for connecting with inhouse systems.

For Example -

```
FROM docker.getcollate.io/openmetadata/server:x.y.z
WORKDIR /home/
COPY <my-organization-certs> .
RUN update-ca-certificates
```

where `docker.getcollate.io/openmetadata/server:x.y.z` needs to point to the same version of the Collate server, for example `docker.getcollate.io/openmetadata/server:1.13.0`.
This image needs to be built and published to the container registry of your choice.

### 2. Update your openmetadata helm values yaml

The Collate Application gets installed as part of `openmetadata` helm chart. In this step, update the custom helm values using YAML file to point the image created in the previous step. For example, create a helm values file named `values.yaml` with the following contents -

```yaml theme={null}
...
image:
  repository: <your repository>
  # Overrides the image tag whose default is the chart appVersion.
  tag: <your tag>
...
```

### 3. Install / Upgrade your helm release

Upgrade/Install your openmetadata helm charts with the below single command:

```bash theme={null}
helm upgrade --install openmetadata open-metadata/openmetadata--values values.yaml
```

## Extending Collate Ingestion Docker Image

One possible use case where you would need to use a custom image for the ingestion is because you have developed your own custom connectors.
You can find a complete working example of this [here](https://github.com/open-metadata/openmetadata-demo/tree/main/custom-connector). After
you have your code ready, the steps would be the following:

### 1. Create a `Dockerfile` based on `docker.getcollate.io/openmetadata/ingestion`:

For example -

```
FROM docker.getcollate.io/openmetadata/ingestion:x.y.z

USER airflow
# Let's use the home directory of airflow user
WORKDIR /home/airflow

# Install our custom connector
COPY <your_package> <your_package>
COPY setup.py .
RUN pip install --no-deps .
```

where `docker.getcollate.io/openmetadata/ingestion:x.y.z` needs to point to the same version of the Collate server, for example `docker.getcollate.io/openmetadata/ingestion:1.13.0`.
This image needs to be built and published to the container registry of your choice.

### 2. Update the airflow in openmetadata dependencies values YAML

The ingestion containers (which is the one shipping Airflow) gets installed in the `openmetadata-dependencies` helm chart. In this step, we use
our own custom values YAML file to point to the image we just created on the previous step. You can create a file named `values.deps.yaml` with the
following contents:

```yaml theme={null}
airflow:
  airflow:
    image:
      repository: <your repository>  # by default, openmetadata/ingestion
      tag: <your tag>  # by default, the version you are deploying, e.g., 1.13.0
      pullPolicy: "IfNotPresent"
```

### 3. Install / Upgrade helm release

Upgrade/Install your openmetadata-dependencies helm charts with the below single command:

```bash theme={null}
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values values.deps.yaml
```

## How to disable MySQL and ElasticSearch from Collate Dependencies Helm Charts ?

If you are using MySQL and ElasticSearch externally, you would want to disable the local installation of mysql and elasticsearch while installing Collate Dependencies Helm Chart. You can disable the MySQL and ElasticSearch Helm Dependencies by setting `enabled: false` value for each dependency. Below is the command to set helm values from Helm CLI -

```commandline theme={null}
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --set mysql.enabled=false --set elasticsearch.enabled=false
```

Alternatively, you can create a custom YAML file named `values.deps.yaml` to disable installation of MySQL and Elasticsearch .

```yaml theme={null}
mysql:
    enabled: false
    ...
elasticsearch:
    enabled: false
    ...
...
```

## How to configure external database like PostgreSQL with Collate Helm Charts ?

Collate Supports PostgreSQL as one of the Database Dependencies. Collate Helm Charts by default does not include PostgreSQL as Database Dependencies. In order to configure Helm Charts with External Database like PostgreSQL, follow the below guide to make the helm values change and upgrade / install Collate helm charts with the same.

## Upgrade Airflow Helm Dependencies Helm Charts to connect to External Database like PostgreSQL

We ship [airflow-helm](https://github.com/airflow-helm/charts/tree/main/charts/airflow) as one of OpenMetadata Dependencies with default values to connect to MySQL Database as part of `externalDatabase` configurations.

You can find more information on setting the `externalDatabase` as part of helm values [here](https://github.com/airflow-helm/charts/blob/main/charts/airflow/docs/faq/database/external-database.md).

With Collate Dependencies Helm Charts, your helm values would look something like below -

```yaml theme={null}
...
airflow:
  externalDatabase:
    type: postgresql
    host: <postgresql_endpoint>
    port: 5432
    database: <airflow_database_name>
    user: <airflow_database_login_user>
    passwordSecret: airflow-postgresql-secrets
    passwordSecretKey: airflow-postgresql-password
...
```

For the above code, it is assumed you are creating a kubernetes secret for storing Airflow Database login Credentials. A sample command to create the secret will be `kubectl create secret generic airflow-postgresql-secrets --from-literal=airflow-postgresql-password=<password>`.

## Upgrade Collate Helm Charts to connect to External Database like PostgreSQL

Update the `openmetadata.config.database.*` helm values for Collate Application to connect to External Database like PostgreSQL.

With Collate Helm Charts, your helm values would look something like below -

```yaml theme={null}
openmetadata:
  config:
    ...
    database:
      host: <postgresql_endpoint>
      port: 5432
      driverClass: org.postgresql.Driver
      dbScheme: postgresql
      dbUseSSL: true
      databaseName: <openmetadata_database_name>
      auth:
        username: <database_login_user>
        password:
          secretRef: openmetadata-postgresql-secrets
          secretKey: openmetadata-postgresql-password
```

For the above code, it is assumed you are creating a kubernetes secret for storing Collate Database login Credentials. A sample command to create the secret will be `kubectl create secret generic openmetadata-postgresql-secrets --from-literal=openmetadata-postgresql-password=<password>`.

Once you make the above changes to your helm values, run the below command to install/upgrade helm charts -

```commandline theme={null}
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values <<path-to-values-file>> --namespace <kubernetes_namespace>
helm upgrade --install openmetadata open-metadata/openmetadata --values <<path-to-values-file>> --namespace <kubernetes_namespace>
```

## How to customize Collate Dependencies Helm Chart with custom helm values

Our Collate Dependencies Helm Charts are internally depends on three sub-charts -

* [Bitnami MySQL](https://artifacthub.io/packages/helm/bitnami/mysql/9.7.2) (helm chart version 9.7.2)
* [OpenSearch](https://artifacthub.io/packages/helm/opensearch-project-helm-charts/opensearch/3.3.2) (helm chart version 3.3.2)
* [Airflow](https://artifacthub.io/packages/helm/airflow-helm/airflow/8.8.0) (helm chart version 8.8.0)

If you are looking to customize the deployments of any of the above dependencies, please refer to the above links for customizations of helm values for further references.

By default, OpenMetadata Dependencies helm chart provides initial generic customization of these helm values in order to get you started quickly. You can refer to the openmetadata-dependencies helm charts default values [here](https://github.com/open-metadata/openmetadata-helm-charts/blob/main/charts/deps/values.yaml).
