Skip to main content

EKS on Amazon Web Services Deployment

OpenMetadata supports the Installation and Running of Application on Elastic Kubernetes Services (EKS) through Helm Charts. However, there are some additional configurations which needs to be done as prerequisites for the same.
All the code snippets in this section assume the default namespace for kubernetes. This guide presumes you have AWS EKS Cluster already available.

Prerequisites

AWS Services for Database as RDS and Search Engine as ElasticSearch

It is recommended to use Amazon RDS and Amazon OpenSearch Service for Production Deployments. We support
  • Amazon RDS (MySQL) engine version 8 or higher
  • Amazon RDS (PostgreSQL) engine version 12 or higher
  • Amazon OpenSearch engine version 2.X (upto 2.19)
When using AWS Services the SearchType Configuration for elastic search should be opensearch, for both cases ElasticSearch and OpenSearch, as you can see in the ElasticSearch configuration example below.
We recommend
  • Amazon RDS to be in Multiple Availability Zones.
  • Amazon OpenSearch (or ElasticSearch) Service with Multiple Availability Zones with minimum 2 Nodes.
Make sure to increase sort_buffer_size (for MySQL) or work_mem (for PostgreSQL) to the recommended value of 20MB or more using the database parameter group setting. This is especially important when running migrations to prevent Out of Sort Memory Error. You can revert the setting once the migrations are complete.
Once you have the RDS and OpenSearch Services Setup, you can update the environment variables below for OpenMetadata kubernetes deployments to connect with Database and ElasticSearch.
# openmetadata-values.prod.yaml
...
openmetadata:
  config:
    elasticsearch:
      host: <AMAZON_OPENSEARCH_SERVICE_ENDPOINT_WITHOUT_HTTPS>
      searchType: opensearch
      port: 443
      scheme: https
      connectionTimeoutSecs: 5
      socketTimeoutSecs: 60
      keepAliveTimeoutSecs: 600
      batchSize: 10
      auth:
        enabled: true
        username: <AMAZON_OPENSEARCH_USERNAME>
        password:
          secretRef: elasticsearch-secrets
          secretKey: openmetadata-elasticsearch-password
    database:
      host: <AMAZON_RDS_ENDPOINT>
      port: 3306
      driverClass: com.mysql.cj.jdbc.Driver
      dbScheme: mysql
      dbUseSSL: true
      databaseName: <RDS_DATABASE_NAME>
      auth:
        username: <RDS_DATABASE_USERNAME>
        password:
          secretRef: mysql-secrets
          secretKey: openmetadata-mysql-password
  ...
Make sure to create RDS and OpenSearch credentials as Kubernetes Secrets mentioned here. Also, disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts as mentioned in the FAQs here.

Create Elastic File System in AWS

You can follow official AWS Guides here to provision EFS File System in the same VPC which is associated with your EKS Cluster.

Persistent Volumes with ReadWriteMany Access Modes

OpenMetadata helm chart depends on Airflow and Airflow expects a persistent disk that support ReadWriteMany (the volume can be mounted as read-write by many nodes). In AWS, this is achieved by Elastic File System (EFS) service. AWS Elastic Block Store (EBS) does not provide ReadWriteMany Volume access mode as EBS will only be attached to one Kubernetes Node at any given point of time. In order to provision persistent volumes from AWS EFS, you will need to setup and install aws-efs-csi-driver. Note that this is required for Airflow as One OpenMetadata Dependencies. Also, aws-ebs-csi-driver might be required for Persistent Volumes that are to be used for MySQL and ElasticSearch as OpenMetadata Dependencies. The below guide provides Persistent Volumes provisioning as static volumes (meaning you will be responsible to create, maintain and destroy Persistent Volumes).

Provision EFS backed PVs, PVCs for Airflow DAGs and Airflow Logs

Please note that we are using one AWS Elastic File System (EFS) service with subdirectories as airflow-dags and airflow-logs with the reference in this documentation. Also, it is presumed that airflow-dags and airflow-logs directories are already available on that file system. In order to create directories inside the AWS Elastic File System (EFS) you would need to follow these steps.

Code Samples for PV and PVC for Airflow DAGs

# dags_pv_pvc.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: openmetadata-dependencies-dags-pv
  labels:
    app: airflow-dags
spec:
  capacity:
    storage: 10Gi
  storageClassName: ""
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: efs.csi.aws.com
    volumeHandle: <FileSystemId>:/airflow-dags # Replace with EFS File System Id

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: airflow-dags
  name: openmetadata-dependencies-dags-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 10Gi
Create Persistent Volumes and Persistent Volume claims with the below command.
kubectl create -f dags_pv_pvc.yml

Code Samples for PV and PVC for Airflow Logs

# logs_pv_pvc.yml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: openmetadata-dependencies-logs-pv
  labels:
    app: airflow-logs
spec:
  capacity:
    storage: 5Gi
  storageClassName: ""
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: efs.csi.aws.com
    volumeHandle: <FileSystemId>:/airflow-logs # Replace with EFS File System Id

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openmetadata-dependencies-logs-pvc
  namespace: default
  labels:
    app: airflow-dags
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi
Create Persistent Volumes and Persistent Volume claims with the below command.
kubectl create -f logs_pv_pvc.yml

Change owner and permission manually on disks

Since airflow pods run as non root users, they would not have write access on the nfs server volumes. In order to fix the permission here, spin up a pod with persistent volumes attached and run it once. You can find more reference on AWS EFS permissions in docs here.
# permissions_pod.yml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: my-permission-pod
  name: my-permission-pod
spec:
  containers:
  - image: nginx
    name: my-permission-pod
    volumeMounts:
    - name: airflow-dags
      mountPath: /airflow-dags
    - name: airflow-logs
      mountPath: /airflow-logs
    command:
    - "chown -R 50000 /airflow-dags /airflow-logs"
    # if needed
    - "chmod -R a+rwx /airflow-dags"
  volumes:
  - name: airflow-logs
    persistentVolumeClaim:
      claimName: openmetadata-dependencies-logs-pvc
  - name: airflow-dags
    persistentVolumeClaim:
      claimName: openmetadata-dependencies-dags-pvc
  dnsPolicy: ClusterFirst
  restartPolicy: Always
Airflow runs the pods with linux user name as airflow and linux user id as 50000.
Run the below command to create the pod and fix the permissions
kubectl create -f permissions_pod.yml

Create OpenMetadata dependencies Values

Override openmetadata dependencies airflow helm values to bind the efs persistent volumes for DAGs and logs.
# values-dependencies.yml
airflow:
  airflow:
    extraVolumeMounts:
      - mountPath: /airflow-logs
        name: efs-airflow-logs
      - mountPath: /airflow-dags/dags
        name: efs-airflow-dags
    extraVolumes:
      - name: efs-airflow-logs
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-logs-pvc
      - name: efs-airflow-dags
        persistentVolumeClaim:
          claimName: openmetadata-dependencies-dags-pvc
    config:
      AIRFLOW__OPENMETADATA_AIRFLOW_APIS__DAG_GENERATED_CONFIGS: "/airflow-dags/dags"
  dags:
    path: /airflow-dags/dags
    persistence:
      enabled: false
  logs:
    path: /airflow-logs
    persistence:
      enabled: false
For more information on airflow helm chart values, please refer to airflow-helm. When deploying openmetadata dependencies helm chart, use the below command -
helm install openmetadata-dependencies open-metadata/openmetadata-dependencies --values values-dependencies.yaml
The above command uses configurations defined here. You can modify any configuration and deploy by passing your own values.yaml
helm install openmetadata-dependencies open-metadata/openmetadata-dependencies --values <path-to-values-file>
Once the openmetadata dependencies helm chart deployed, you can then run the below command to install the openmetadata helm chart -
helm install openmetadata open-metadata/openmetadata
Again, this uses the values defined here. Use the --values flag to point to your own YAML configuration if needed.

FAQs

Getting an error when install OpenMetadata Dependencies Helm Charts on EKS with EFS

If you are facing the below issue -
MountVolume.SetUp failed for volume "openmetadata-dependencies-dags-pv" : rpc error: code = Internal desc = Could not mount "fs-012345abcdef:/airflow-dags" at "/var/lib/kubelet/pods/xyzabc-123-0062-44c3-b0e9-fa193c19f41c/volumes/kubernetes.io~csi/openmetadata-dependencies-dags-pv/mount": mount failed: exit status 1 Mounting command: mount Mounting arguments: -t efs -o tls fs-012345abcdef:/airflow-dags /var/lib/kubelet/pods/xyzabc-123-0062-44c3-b0e9-fa193c19f41c/volumes/kubernetes.io~csi/openmetadata-dependencies-dags-pv/mount Output: Failed to locate an available port in the range [20049, 20449], try specifying a different port range in /etc/amazon/efs/efs-utils.conf
This error is typically related to EKS Cluster not able to reach to EFS File systems. You can check the security groups associated between the connectivity EFS and EKS. Here is an article which further describes the steps required to create Security Group Rules for EKS to use EFS over port 2049. It can also happen if the mount targets are already available for EKS Nodes but the Nodes do not pick that up. In such cases, you can do an AWS AutoScaling Group instance refresh in order for EKS nodes to get the available mount targets.

Java Memory Heap Issue

If your openmetadata pods are not in ready state at any point in time and the openmetadata pod logs speaks about the below issue -
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "AsyncAppender-Worker-async-file-appender"
Exception in thread "pool-5-thread-1" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-file-appender" java.lang.OutOfMemoryError: Java heap space
Exception in thread "dw-46" java.lang.OutOfMemoryError: Java heap space
Exception in thread "AsyncAppender-Worker-async-console-appender" java.lang.OutOfMemoryError: Java heap space
This is due to the default JVM Heap Space configuration (1 GiB) being not enough for your workloads. In order to resolve this issue, head over to your custom openmetadata helm values and append the below environment variable
extraEnvs:
- name: OPENMETADATA_HEAP_OPTS
  value: "-Xmx2G -Xms2G"
The flag Xmx specifies the maximum memory allocation pool for a Java virtual machine (JVM), while Xms specifies the initial memory allocation pool. Upgrade the helm charts with the above changes using the following command helm upgrade --install openmetadata open-metadata/openmetadata --values <values.yml> --namespace <namespaceName>. Update this command your values.yml filename and namespaceName where you have deployed OpenMetadata in Kubernetes.

PostgreSQL Issue permission denied to create extension “pgcrypto”

If you are facing the below issue with PostgreSQL as Database Backend for OpenMetadata Application,
Message: ERROR: permission denied to create extension "pgcrypto"
Hint: Must be superuser to create this extension.
It seems the Database User does not have sufficient privileges. In order to resolve the above issue, grant usage permissions to the PSQL User.
GRANT USAGE ON SCHEMA schema_name TO <openmetadata_psql_user>;
GRANT CREATE ON EXTENSION pgcrypto TO <openmetadata_psql_user>;
In the above command, replace <openmetadata_psql_user> with the sql user used by OpenMetadata Application to connect to PostgreSQL Database.

How to extend and use custom docker images with OpenMetadata Helm Charts ?

Extending OpenMetadata Server Docker Image

1. Create a Dockerfile based on docker.getcollate.io/openmetadata/server

OpenMetadata helm charts uses official published docker images from DockerHub. A typical scenario will be to install organization certificates for connecting with inhouse systems. For Example -
FROM docker.getcollate.io/openmetadata/server:x.y.z
WORKDIR /home/
COPY <my-organization-certs> .
RUN update-ca-certificates
where docker.getcollate.io/openmetadata/server:x.y.z needs to point to the same version of the OpenMetadata server, for example docker.getcollate.io/openmetadata/server:1.3.1. This image needs to be built and published to the container registry of your choice.

2. Update your openmetadata helm values yaml

The OpenMetadata Application gets installed as part of openmetadata helm chart. In this step, update the custom helm values using YAML file to point the image created in the previous step. For example, create a helm values file named values.yaml with the following contents -
...
image:
  repository: <your repository>
  # Overrides the image tag whose default is the chart appVersion.
  tag: <your tag>
...

3. Install / Upgrade your helm release

Upgrade/Install your openmetadata helm charts with the below single command:
helm upgrade --install openmetadata open-metadata/openmetadata--values values.yaml

Extending OpenMetadata Ingestion Docker Image

One possible use case where you would need to use a custom image for the ingestion is because you have developed your own custom connectors. You can find a complete working example of this here. After you have your code ready, the steps would be the following:

1. Create a Dockerfile based on docker.getcollate.io/openmetadata/ingestion:

For example -
FROM docker.getcollate.io/openmetadata/ingestion:x.y.z

USER airflow
# Let's use the home directory of airflow user
WORKDIR /home/airflow

# Install our custom connector
COPY <your_package> <your_package>
COPY setup.py .
RUN pip install --no-deps .
where docker.getcollate.io/openmetadata/ingestion:x.y.z needs to point to the same version of the OpenMetadata server, for example docker.getcollate.io/openmetadata/ingestion:1.3.1. This image needs to be built and published to the container registry of your choice.

2. Update the airflow in openmetadata dependencies values YAML

The ingestion containers (which is the one shipping Airflow) gets installed in the openmetadata-dependencies helm chart. In this step, we use our own custom values YAML file to point to the image we just created on the previous step. You can create a file named values.deps.yaml with the following contents:
airflow:
  airflow:
    image:
      repository: <your repository>  # by default, openmetadata/ingestion
      tag: <your tag>  # by default, the version you are deploying, e.g., 1.1.0
      pullPolicy: "IfNotPresent"

3. Install / Upgrade helm release

Upgrade/Install your openmetadata-dependencies helm charts with the below single command:
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values values.deps.yaml

How to disable MySQL and ElasticSearch from OpenMetadata Dependencies Helm Charts ?

If you are using MySQL and ElasticSearch externally, you would want to disable the local installation of mysql and elasticsearch while installing OpenMetadata Dependencies Helm Chart. You can disable the MySQL and ElasticSearch Helm Dependencies by setting enabled: false value for each dependency. Below is the command to set helm values from Helm CLI -
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --set mysql.enabled=false --set elasticsearch.enabled=false
Alternatively, you can create a custom YAML file named values.deps.yaml to disable installation of MySQL and Elasticsearch .
mysql:
    enabled: false
    ...
elasticsearch:
    enabled: false
    ...
...

How to configure external database like PostgreSQL with OpenMetadata Helm Charts ?

OpenMetadata Supports PostgreSQL as one of the Database Dependencies. OpenMetadata Helm Charts by default does not include PostgreSQL as Database Dependencies. In order to configure Helm Charts with External Database like PostgreSQL, follow the below guide to make the helm values change and upgrade / install OpenMetadata helm charts with the same.

Upgrade Airflow Helm Dependencies Helm Charts to connect to External Database like PostgreSQL

We ship airflow-helm as one of OpenMetadata Dependencies with default values to connect to MySQL Database as part of externalDatabase configurations. You can find more information on setting the externalDatabase as part of helm values here. With OpenMetadata Dependencies Helm Charts, your helm values would look something like below -
...
airflow:
  externalDatabase:
    type: postgresql
    host: <postgresql_endpoint>
    port: 5432
    database: <airflow_database_name>
    user: <airflow_database_login_user>
    passwordSecret: airflow-postgresql-secrets
    passwordSecretKey: airflow-postgresql-password
...
For the above code, it is assumed you are creating a kubernetes secret for storing Airflow Database login Credentials. A sample command to create the secret will be kubectl create secret generic airflow-postgresql-secrets --from-literal=airflow-postgresql-password=<password>.

Upgrade OpenMetadata Helm Charts to connect to External Database like PostgreSQL

Update the openmetadata.config.database.* helm values for OpenMetadata Application to connect to External Database like PostgreSQL. With OpenMetadata Helm Charts, your helm values would look something like below -
openmetadata:
  config:
    ...
    database:
      host: <postgresql_endpoint>
      port: 5432
      driverClass: org.postgresql.Driver
      dbScheme: postgresql
      dbUseSSL: true
      databaseName: <openmetadata_database_name>
      auth:
        username: <database_login_user>
        password:
          secretRef: openmetadata-postgresql-secrets
          secretKey: openmetadata-postgresql-password
For the above code, it is assumed you are creating a kubernetes secret for storing OpenMetadata Database login Credentials. A sample command to create the secret will be kubectl create secret generic openmetadata-postgresql-secrets --from-literal=openmetadata-postgresql-password=<password>. Once you make the above changes to your helm values, run the below command to install/upgrade helm charts -
helm upgrade --install openmetadata-dependencies open-metadata/openmetadata-dependencies --values <<path-to-values-file>> --namespace <kubernetes_namespace>
helm upgrade --install openmetadata open-metadata/openmetadata --values <<path-to-values-file>> --namespace <kubernetes_namespace>

How to customize OpenMetadata Dependencies Helm Chart with custom helm values

Our OpenMetadata Dependencies Helm Charts are internally depends on three sub-charts - If you are looking to customize the deployments of any of the above dependencies, please refer to the above links for customizations of helm values for further references. By default, OpenMetadata Dependencies helm chart provides initial generic customization of these helm values in order to get you started quickly. You can refer to the openmetadata-dependencies helm charts default values here.