Enable RDF (Knowledge Graph)
Collate’s Knowledge Graph & Ontology Explorer stores your metadata in a graph database powered by RDF (Resource Description Framework). Collate uses Apache Jena Fuseki as the SPARQL (SPARQL Protocol and RDF Query Language) store — the Collate server writes your entity metadata, relationships, and lineage into Fuseki, and the RDF Indexing application builds and maintains the graph.This page applies to BYOC deployments only. On Collate SaaS, RDF and Fuseki are deployed and managed for you — no action is required. RDF is available in Collate 1.13 and later.
- Deploy Apache Jena Fuseki into your cluster.
- Configure the Collate Helm release to point at the Fuseki endpoint.
Prerequisites
Before you begin, make sure the following are in place.- A running Collate BYOC deployment installed with the
open-metadata/openmetadataHelm chart. See EKS, AKS, GKE, or On-Prem. - OpenSearch as the search engine — already required for Collate BYOC.
kubectlandhelmaccess to the namespace where Collate is deployed (the examples below usecollate).
Step 1: Deploy Apache Jena Fuseki
Fuseki ships with theopen-metadata/openmetadata-dependencies Helm chart (Collate 1.13.1 and later). Because Collate BYOC brings its own database and OpenSearch, you install this chart as a small, dedicated release with only Fuseki enabled — MySQL, OpenSearch, and Airflow stay off. The chart deploys Fuseki as a Deployment with a ClusterIP Service so the Collate server can reach it in-cluster.
-
Create the Fuseki admin credentials. Fuseki needs an admin password, and the Collate server uses the same credentials to manage its dataset. Create a Kubernetes secret holding the password — the Fuseki chart reads
fuseki-admin-credentials/admin-passwordby default: -
Deploy Fuseki from the dependencies chart. Add the Helm repository (skip if you already added it for the Collate install):
Create a values file that enables only Fuseki. The
fusekivalues shown are the chart defaults (the production BYOC sizing) — the minimum required isfuseki.enabled: trueplus disabling the other dependencies; tune the rest for your catalog size:Install it as a dedicated release in the same namespace as Collate:This deploys afusekiDeployment, a ClusterIP Service, and a PersistentVolumeClaim. The Collate server reaches Fuseki in-cluster athttp://fuseki:3030(orhttp://fuseki.collate.svc.cluster.local:3030from other namespaces).You don’t need to create the Fuseki dataset manually. On startup, the Collate server checks for the configured dataset and creates a writable TDB2 dataset through the Fuseki admin API using the credentials from step 1.
Step 2: Configure Collate to use Fuseki
Add therdf block under openmetadata.config in your Collate Helm values (for example openmetadata.values.yml):
| Key | Description | Notes |
|---|---|---|
enabled | Turns RDF support on. | Set to true. |
storageType | RDF storage backend. | FUSEKI. |
remoteEndpoint | SPARQL endpoint URL. | In-cluster Fuseki service, path is the dataset name. |
username | Fuseki admin user. | admin for the bundled Fuseki image. |
password.secretRef / password.secretKey | Secret and key holding the Fuseki admin password. | Reuse fuseki-admin-credentials from Step 1. |
dataset | Dataset name in Fuseki. | Must match the path in remoteEndpoint. |
baseUri | Base URI for RDF resources. | Leave as the default unless instructed otherwise. |
rdf.enabled is true, the chart renders these into the Collate server environment as RDF_ENABLED, RDF_STORAGE_TYPE, RDF_ENDPOINT, RDF_REMOTE_USERNAME, RDF_DATASET, RDF_BASE_URI, and RDF_REMOTE_PASSWORD (pulled from the secret).
Apply the change with a Helm upgrade against your existing release:
Step 3: Populate the Knowledge Graph
Enabling RDF wires up the store, but the graph is populated by the RDF Indexing application.- In Collate, go to Settings > Applications.
- Open the RDF Indexing app.
- Run the app.
Troubleshooting
Use these checks if RDF indexing isn’t working as expected after completing the setup.- The Knowledge Graph tab doesn’t appear on data assets RDF is enabled, but the graph hasn’t been populated yet. Go to Settings > Applications, open the RDF Indexing app, and run it. The tab appears once indexing completes.
-
The RDF Indexing app fails or the Collate server shows a connection error
This usually means the Collate server can’t connect to Fuseki, or the credentials are wrong. Check the following:
- Confirm the Fuseki pod is running:
The pod status should be
Running. If it’sPendingorCrashLoopBackOff, check its logs: - Confirm the
remoteEndpointin your Helm values points to the correct in-cluster address (http://fuseki:3030/<dataset-name>) and that the dataset name in the URL matches thedatasetfield. - Confirm the
usernameandpasswordvalues match the secret you created in Step 1: Deploy Apache Jena Fuseki.
- Confirm the Fuseki pod is running:
-
Indexing starts but stops partway through, and the Fuseki pod restarts
Fuseki ran out of memory during indexing. This happens when the catalog is large. Raise both
fuseki.jvmArgs(-Xmx) andfuseki.resources.limits.memorytogether infuseki.values.yml, upgrade the Fuseki release, then re-run the RDF Indexing app:The chart defaults (8 GB heap / 12 GiB limit) follow the production BYOC recommendation — larger catalogs may need more.