how-to-guides

No menu items for this category
Collate Documentation

Spark Engine Prerequisites

  • Spark Connect available (versions 3.5.2 to 3.5.6 supported)
  • Network access from the pipeline execution environment to the Spark Connect endpoint
  • Network access from the pipeline execution environment to the OpenMetadata server

Depending on your source database, ensure the appropriate driver is installed in your Spark cluster:

  • PostgreSQL: org.postgresql.Driver
  • MySQL: com.mysql.cj.jdbc.Driver

The specific driver versions should match your Spark version and database version for optimal compatibility.

The pipeline execution environment must have:

  • Outbound access to your Spark Connect endpoint (typically port 15002)
  • Outbound access to your OpenMetadata server (typically port 8585)
  • Inbound access from Spark workers to your source database
  1. Test Spark Connect: Verify connectivity from your pipeline environment to Spark Connect
  2. Test OpenMetadata: Ensure your pipeline environment can reach the OpenMetadata API
  3. Test Database: Confirm Spark workers can connect to your source database
  4. Verify Drivers: Check that the appropriate database driver is available in your Spark cluster