
ADLS
PRODHow to Run the Connector Externally
To run the Ingestion via the UI you’ll need to use the OpenMetadata Ingestion Container, which comes shipped with custom Airflow plugins to handle the workflow deployment. If, instead, you want to manage your workflows externally on your preferred orchestrator, you can check the following docs to run the Ingestion Framework anywhere.Requirements
OpenMetadata 1.0 or later
To deploy OpenMetadata, check the Deployment guides.
ADLS Permissions
To extract metadata from Azure ADLS (Storage Account - StorageV2), you will need an App Registration with the following permissions on the Storage Account:- Storage Blob Data Contributor
- Storage Queue Data Contributor
OpenMetadata Manifest
In any other connector, extracting metadata happens automatically. In this case, we will be able to extract high-level metadata from buckets, but in order to understand their internal structure we need users to provide anopenmetadata.json
file at the bucket root.
Supported File Formats: [ "csv", "tsv", "avro", "parquet", "json", "json.gz", "json.zip" ]
You can learn more about this here. Keep reading for an example on the shape of the manifest file.
OpenMetadata Manifest
Our manifest file is defined as a JSON Schema, and can look like this:Global Manifest
You can also manage a single manifest file to centralize the ingestion process for any container, namedopenmetadata_storage_manifest.json.
You can also keep local manifests openmetadata.json in each container, but if possible, we will always try to pick up the global manifest during the ingestion.