> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getcollate.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Enable RDF (Knowledge Graph) | Collate

> Enable the RDF Knowledge Graph for Collate BYOC by deploying Apache Jena Fuseki and pointing the Collate server at the triplestore.

# Enable RDF (Knowledge Graph)

Collate's **Knowledge Graph & Ontology Explorer** stores your metadata in a graph database powered by RDF (Resource Description Framework). Collate uses [Apache Jena Fuseki](https://jena.apache.org/documentation/fuseki2/) as the SPARQL (SPARQL Protocol and RDF Query Language) store — the Collate server writes your entity metadata, relationships, and lineage into Fuseki, and the **RDF Indexing** application builds and maintains the graph.

<Note>
  This page applies to **BYOC** deployments only. On Collate **SaaS**, RDF and Fuseki are deployed and managed for you — no action is required. RDF is available in Collate **1.13** and later.
</Note>

To enable RDF on BYOC, follow these steps:

1. Deploy Apache Jena Fuseki into your cluster.
2. Configure the Collate Helm release to point at the Fuseki endpoint.

## Prerequisites

Before you begin, make sure the following are in place.

* A running Collate BYOC deployment installed with the `open-metadata/openmetadata` Helm chart. See [EKS](/how-to-guides/deployment/byoc/kubernetes/eks), [AKS](/how-to-guides/deployment/byoc/kubernetes/aks), [GKE](/how-to-guides/deployment/byoc/kubernetes/gke), or [On-Prem](/how-to-guides/deployment/byoc/kubernetes/on-prem).
* **OpenSearch** as the search engine — already required for Collate BYOC.
* `kubectl` and `helm` access to the namespace where Collate is deployed (the examples below use `collate`).

## Step 1: Deploy Apache Jena Fuseki

Fuseki ships with the `open-metadata/openmetadata-dependencies` Helm chart (Collate **1.13.1** and later). Because Collate BYOC brings its own database and OpenSearch, you install this chart as a small, dedicated release with **only Fuseki enabled** — MySQL, OpenSearch, and Airflow stay off. The chart deploys Fuseki as a Deployment with a ClusterIP Service so the Collate server can reach it in-cluster.

1. **Create the Fuseki admin credentials.** Fuseki needs an admin password, and the Collate server uses the same credentials to manage its dataset. Create a Kubernetes secret holding the password — the Fuseki chart reads `fuseki-admin-credentials` / `admin-password` by default:

   ```bash theme={null}
   kubectl create secret generic fuseki-admin-credentials \
     --from-literal=admin-password=<STRONG_PASSWORD> \
     --namespace collate
   ```

2. **Deploy Fuseki from the dependencies chart.** Add the Helm repository (skip if you already added it for the Collate install):

   ```bash theme={null}
   helm repo add open-metadata https://helm.open-metadata.org/
   helm repo update
   ```

   Create a values file that enables only Fuseki. The `fuseki` values shown are the chart defaults (the production BYOC sizing) — the minimum required is `fuseki.enabled: true` plus disabling the other dependencies; tune the rest for your catalog size:

   ```yaml theme={null}
   # fuseki.values.yml — deploy only Fuseki from the openmetadata-dependencies chart
   mysql:
     enabled: false
   opensearch:
     enabled: false
   airflow:
     enabled: false

   fuseki:
     enabled: true
     # Persist the graph across pod restarts (enabled by default).
     persistence:
       enabled: true
       size: 100Gi
       storageClass: gp3   # set to a StorageClass in your cluster; leave "" for the cluster default
     jvmArgs: "-Xmx8g -Xms8g"
     resources:
       requests:
         cpu: "1500m"
         memory: "10Gi"
       limits:
         cpu: "2500m"
         memory: "12Gi"
   ```

   Install it as a dedicated release in the same namespace as Collate:

   ```bash theme={null}
   helm upgrade --install fuseki open-metadata/openmetadata-dependencies \
     --values fuseki.values.yml \
     --namespace collate
   ```

   This deploys a `fuseki` Deployment, a ClusterIP Service, and a PersistentVolumeClaim. The Collate server reaches Fuseki in-cluster at `http://fuseki:3030` (or `http://fuseki.collate.svc.cluster.local:3030` from other namespaces).

   <Note>
     You don't need to create the Fuseki dataset manually. On startup, the Collate server checks for the configured dataset and creates a writable TDB2 dataset through the Fuseki admin API using the credentials from step 1.
   </Note>

   <Warning>
     **Size Fuseki for your catalog.** Fuseki holds the whole graph in memory and can be `OOMKilled` during the initial RDF indexing of a large catalog. Keep `jvmArgs` (`-Xmx`) below the container memory limit and raise both together for larger deployments. The defaults above (8 GB heap / 12 GiB limit, 100 GiB storage) follow the production BYOC recommendation; reduce them for smaller catalogs.
   </Warning>

## Step 2: Configure Collate to use Fuseki

Add the `rdf` block under `openmetadata.config` in your Collate Helm values (for example `openmetadata.values.yml`):

```yaml theme={null}
openmetadata:
  config:
    rdf:
      enabled: true
      storageType: "FUSEKI"
      remoteEndpoint: "http://fuseki:3030/openmetadata"
      username: "admin"
      password:
        secretRef: fuseki-admin-credentials
        secretKey: admin-password
      dataset: "openmetadata"
      baseUri: "https://open-metadata.org/"
```

| Key                                         | Description                                       | Notes                                                |
| ------------------------------------------- | ------------------------------------------------- | ---------------------------------------------------- |
| `enabled`                                   | Turns RDF support on.                             | Set to `true`.                                       |
| `storageType`                               | RDF storage backend.                              | `FUSEKI`.                                            |
| `remoteEndpoint`                            | SPARQL endpoint URL.                              | In-cluster Fuseki service, path is the dataset name. |
| `username`                                  | Fuseki admin user.                                | `admin` for the bundled Fuseki image.                |
| `password.secretRef` / `password.secretKey` | Secret and key holding the Fuseki admin password. | Reuse `fuseki-admin-credentials` from Step 1.        |
| `dataset`                                   | Dataset name in Fuseki.                           | Must match the path in `remoteEndpoint`.             |
| `baseUri`                                   | Base URI for RDF resources.                       | Leave as the default unless instructed otherwise.    |

When `rdf.enabled` is `true`, the chart renders these into the Collate server environment as `RDF_ENABLED`, `RDF_STORAGE_TYPE`, `RDF_ENDPOINT`, `RDF_REMOTE_USERNAME`, `RDF_DATASET`, `RDF_BASE_URI`, and `RDF_REMOTE_PASSWORD` (pulled from the secret).

Apply the change with a Helm upgrade against your existing release:

```bash theme={null}
helm upgrade --install openmetadata open-metadata/openmetadata \
  --values openmetadata.values.yml \
  --namespace collate
```

After the server pod restarts, confirm there are no Fuseki connection errors:

```bash theme={null}
kubectl logs -n collate deploy/openmetadata | grep -i rdf
```

## Step 3: Populate the Knowledge Graph

Enabling RDF wires up the store, but the graph is populated by the **RDF Indexing** application.

1. In Collate, go to **Settings** > **Applications**.
2. Open the **RDF Indexing** app.
3. Run the app.

After it completes, the **Knowledge Graph** tab appears on data assets (for example, on a table) and the Ontology Explorer reflects your glossary terms and relationships.

## Troubleshooting

Use these checks if RDF indexing isn't working as expected after completing the setup.

* **The Knowledge Graph tab doesn't appear on data assets**

  RDF is enabled, but the graph hasn't been populated yet.

  Go to **Settings** > **Applications**, open the **RDF Indexing** app, and run it. The tab appears once indexing completes.

* **The RDF Indexing app fails or the Collate server shows a connection error**

  This usually means the Collate server can't connect to Fuseki, or the credentials are wrong. Check the following:

  * Confirm the Fuseki pod is running:
    ```bash theme={null}
    kubectl get pods -n collate -l app=fuseki
    ```
    The pod status should be `Running`. If it's `Pending` or `CrashLoopBackOff`, check its logs:
    ```bash theme={null}
    kubectl logs -n collate deploy/fuseki
    ```
  * Confirm the `remoteEndpoint` in your Helm values points to the correct in-cluster address (`http://fuseki:3030/<dataset-name>`) and that the dataset name in the URL matches the `dataset` field.
  * Confirm the `username` and `password` values match the secret you created in [Step 1: Deploy Apache Jena Fuseki](#step-1-deploy-apache-jena-fuseki).

* **Indexing starts but stops partway through, and the Fuseki pod restarts**

  Fuseki ran out of memory during indexing. This happens when the catalog is large. Raise both `fuseki.jvmArgs` (`-Xmx`) and `fuseki.resources.limits.memory` together in `fuseki.values.yml`, upgrade the Fuseki release, then re-run the **RDF Indexing** app:

  ```bash theme={null}
  helm upgrade --install fuseki open-metadata/openmetadata-dependencies \
    --values fuseki.values.yml \
    --namespace collate
  ```

  The chart defaults (8 GB heap / 12 GiB limit) follow the production BYOC recommendation — larger catalogs may need more.
