Skip to main content

Enable RDF (Knowledge Graph)

Collate’s Knowledge Graph & Ontology Explorer stores your metadata in a graph database powered by RDF (Resource Description Framework). Collate uses Apache Jena Fuseki as the SPARQL (SPARQL Protocol and RDF Query Language) store — the Collate server writes your entity metadata, relationships, and lineage into Fuseki, and the RDF Indexing application builds and maintains the graph.
This page applies to BYOC deployments only. On Collate SaaS, RDF and Fuseki are deployed and managed for you — no action is required. RDF is available in Collate 1.13 and later.
To enable RDF on BYOC, follow these steps:
  1. Deploy Apache Jena Fuseki into your cluster.
  2. Configure the Collate Helm release to point at the Fuseki endpoint.

Prerequisites

Before you begin, make sure the following are in place.
  • A running Collate BYOC deployment installed with the open-metadata/openmetadata Helm chart. See EKS, AKS, GKE, or On-Prem.
  • OpenSearch as the search engine — already required for Collate BYOC.
  • kubectl and helm access to the namespace where Collate is deployed (the examples below use collate).

Step 1: Deploy Apache Jena Fuseki

Fuseki ships with the open-metadata/openmetadata-dependencies Helm chart (Collate 1.13.1 and later). Because Collate BYOC brings its own database and OpenSearch, you install this chart as a small, dedicated release with only Fuseki enabled — MySQL, OpenSearch, and Airflow stay off. The chart deploys Fuseki as a Deployment with a ClusterIP Service so the Collate server can reach it in-cluster.
  1. Create the Fuseki admin credentials. Fuseki needs an admin password, and the Collate server uses the same credentials to manage its dataset. Create a Kubernetes secret holding the password — the Fuseki chart reads fuseki-admin-credentials / admin-password by default:
    kubectl create secret generic fuseki-admin-credentials \
      --from-literal=admin-password=<STRONG_PASSWORD> \
      --namespace collate
    
  2. Deploy Fuseki from the dependencies chart. Add the Helm repository (skip if you already added it for the Collate install):
    helm repo add open-metadata https://helm.open-metadata.org/
    helm repo update
    
    Create a values file that enables only Fuseki. The fuseki values shown are the chart defaults (the production BYOC sizing) — the minimum required is fuseki.enabled: true plus disabling the other dependencies; tune the rest for your catalog size:
    # fuseki.values.yml — deploy only Fuseki from the openmetadata-dependencies chart
    mysql:
      enabled: false
    opensearch:
      enabled: false
    airflow:
      enabled: false
    
    fuseki:
      enabled: true
      # Persist the graph across pod restarts (enabled by default).
      persistence:
        enabled: true
        size: 100Gi
        storageClass: gp3   # set to a StorageClass in your cluster; leave "" for the cluster default
      jvmArgs: "-Xmx8g -Xms8g"
      resources:
        requests:
          cpu: "1500m"
          memory: "10Gi"
        limits:
          cpu: "2500m"
          memory: "12Gi"
    
    Install it as a dedicated release in the same namespace as Collate:
    helm upgrade --install fuseki open-metadata/openmetadata-dependencies \
      --values fuseki.values.yml \
      --namespace collate
    
    This deploys a fuseki Deployment, a ClusterIP Service, and a PersistentVolumeClaim. The Collate server reaches Fuseki in-cluster at http://fuseki:3030 (or http://fuseki.collate.svc.cluster.local:3030 from other namespaces).
    You don’t need to create the Fuseki dataset manually. On startup, the Collate server checks for the configured dataset and creates a writable TDB2 dataset through the Fuseki admin API using the credentials from step 1.
    Size Fuseki for your catalog. Fuseki holds the whole graph in memory and can be OOMKilled during the initial RDF indexing of a large catalog. Keep jvmArgs (-Xmx) below the container memory limit and raise both together for larger deployments. The defaults above (8 GB heap / 12 GiB limit, 100 GiB storage) follow the production BYOC recommendation; reduce them for smaller catalogs.

Step 2: Configure Collate to use Fuseki

Add the rdf block under openmetadata.config in your Collate Helm values (for example openmetadata.values.yml):
openmetadata:
  config:
    rdf:
      enabled: true
      storageType: "FUSEKI"
      remoteEndpoint: "http://fuseki:3030/openmetadata"
      username: "admin"
      password:
        secretRef: fuseki-admin-credentials
        secretKey: admin-password
      dataset: "openmetadata"
      baseUri: "https://open-metadata.org/"
KeyDescriptionNotes
enabledTurns RDF support on.Set to true.
storageTypeRDF storage backend.FUSEKI.
remoteEndpointSPARQL endpoint URL.In-cluster Fuseki service, path is the dataset name.
usernameFuseki admin user.admin for the bundled Fuseki image.
password.secretRef / password.secretKeySecret and key holding the Fuseki admin password.Reuse fuseki-admin-credentials from Step 1.
datasetDataset name in Fuseki.Must match the path in remoteEndpoint.
baseUriBase URI for RDF resources.Leave as the default unless instructed otherwise.
When rdf.enabled is true, the chart renders these into the Collate server environment as RDF_ENABLED, RDF_STORAGE_TYPE, RDF_ENDPOINT, RDF_REMOTE_USERNAME, RDF_DATASET, RDF_BASE_URI, and RDF_REMOTE_PASSWORD (pulled from the secret). Apply the change with a Helm upgrade against your existing release:
helm upgrade --install openmetadata open-metadata/openmetadata \
  --values openmetadata.values.yml \
  --namespace collate
After the server pod restarts, confirm there are no Fuseki connection errors:
kubectl logs -n collate deploy/openmetadata | grep -i rdf

Step 3: Populate the Knowledge Graph

Enabling RDF wires up the store, but the graph is populated by the RDF Indexing application.
  1. In Collate, go to Settings > Applications.
  2. Open the RDF Indexing app.
  3. Run the app.
After it completes, the Knowledge Graph tab appears on data assets (for example, on a table) and the Ontology Explorer reflects your glossary terms and relationships.

Troubleshooting

Use these checks if RDF indexing isn’t working as expected after completing the setup.
  • The Knowledge Graph tab doesn’t appear on data assets RDF is enabled, but the graph hasn’t been populated yet. Go to Settings > Applications, open the RDF Indexing app, and run it. The tab appears once indexing completes.
  • The RDF Indexing app fails or the Collate server shows a connection error This usually means the Collate server can’t connect to Fuseki, or the credentials are wrong. Check the following:
    • Confirm the Fuseki pod is running:
      kubectl get pods -n collate -l app=fuseki
      
      The pod status should be Running. If it’s Pending or CrashLoopBackOff, check its logs:
      kubectl logs -n collate deploy/fuseki
      
    • Confirm the remoteEndpoint in your Helm values points to the correct in-cluster address (http://fuseki:3030/<dataset-name>) and that the dataset name in the URL matches the dataset field.
    • Confirm the username and password values match the secret you created in Step 1: Deploy Apache Jena Fuseki.
  • Indexing starts but stops partway through, and the Fuseki pod restarts Fuseki ran out of memory during indexing. This happens when the catalog is large. Raise both fuseki.jvmArgs (-Xmx) and fuseki.resources.limits.memory together in fuseki.values.yml, upgrade the Fuseki release, then re-run the RDF Indexing app:
    helm upgrade --install fuseki open-metadata/openmetadata-dependencies \
      --values fuseki.values.yml \
      --namespace collate
    
    The chart defaults (8 GB heap / 12 GiB limit) follow the production BYOC recommendation — larger catalogs may need more.