Snowflake Cloud Data Warehouse Connector
The Snowflake connector provides the most comprehensive integration with Snowflake’s cloud data warehouse, offering complete metadata management, advanced lineage tracking, and enterprise-grade data governance capabilities.Supported Features
- ✅ Metadata Discovery: Complete database, schema, table, and column discovery
- ✅ Query Usage: Track data access patterns and query performance via account usage
- ✅ Data Profiler: Statistical analysis and data profiling across all table types
- ✅ Data Quality: Custom quality tests and continuous monitoring
- ✅ Data Lineage: Complete table and view relationship tracking
- ✅ Column-level Lineage: Detailed column dependency mapping through query logs
- ✅ dbt Integration: Native dbt model and lineage support
- ✅ Stored Procedures: Function and procedure discovery with metadata
- ✅ Tags: Snowflake native tag discovery and management
- ✅ Sample Data: Data sampling across all table types
- ✅ Auto-Classification: Automated data classification and PII detection
- ✅ Incremental Extraction: Efficient incremental metadata updates
- ✅ Streams: Snowflake streams discovery and metadata
- ✅ Dynamic Tables: Support for Snowflake’s dynamic tables
Enterprise Features
- No Limitations: All Snowflake features are fully supported
- Account Usage Integration: Deep integration with Snowflake’s account usage views
- Role-based Access: Comprehensive RBAC support
- Reverse Metadata: Push metadata changes back to Snowflake (Collate Only)
- Requirements
- Metadata Ingestion
- Query Usage
- Data Profiler
- Data Quality
- Lineage
- dbt Integration
- Troubleshooting
Requirements
To ingest basic metadata snowflake user must have the following privileges:USAGEPrivilege on WarehouseUSAGEPrivilege on DatabaseUSAGEPrivilege on SchemaSELECTPrivilege on Tables
Additional Grant Requirements
Additional Grant Requirements
If running any of:
- Incremental Extraction
- Ingesting Tags
- Ingesting Stored Procedures
- Lineage & Usage Workflow
-
Incremental Extraction: Collate fetches the information by querying
snowflake.account_usage.tables. -
Ingesting Tags: Collate fetches the information by querying
snowflake.account_usage.tag_references. -
Lineage & Usage Workflow: Collate fetches the query logs by querying
snowflake.account_usage.query_historytable. For this the snowflake user should be granted theACCOUNTADMINrole or a role granted IMPORTED PRIVILEGES on the databaseSNOWFLAKE.
account_usage schema here.
- Ingesting Stored Procedures: Collate fetches the information by querying
snowflake.account_usage.procedures&snowflake.account_usage.functions.
Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Snowflake.1. Visit the Services Page
The first step is to ingest the metadata from your sources. Under Settings, you will find a Services link an external source system to Collate. Once a service is created, it can be used to configure metadata, usage, and profiler workflows. To visit the Services page, select Services from the Settings menu.2. Create a New Service
Click on the Add New Service button to start the Service creation.3. Select the Service Type
Select Snowflake as the service type and click Next.4. Name and Describe your Service
Provide a name and description for your service as illustrated below.Connection Details
- Username: Specify the User to connect to Snowflake. It should have enough privileges to read all the metadata.
- Password: Password to connect to Snowflake.
- Account: Snowflake account identifier uniquely identifies a Snowflake account within your organization, as well as throughout the global network of Snowflake-supported cloud platforms and cloud regions. If the Snowflake URL is
https://xyz1234.us-east-1.gcp.snowflakecomputing.com, then the account isxyz1234.us-east-1.gcp. - Role (Optional): You can specify the role of user that you would like to ingest with, if no role is specified the default roles assigned to user will be selected.
- Warehouse: Snowflake warehouse is required for executing queries to fetch the metadata. Enter the name of warehouse against which you would like to execute these queries.
- Database (Optional): The database of the data source is an optional parameter, if you would like to restrict the metadata reading to a single database. If left blank, Collate ingestion attempts to scan all the databases.
- Private Key (Optional): If you have configured the key pair authentication for the given user you will have to pass the private key associated with the user in this field. You can checkout this doc to get more details about key-pair authentication.
- The multi-line key needs to be converted to one line with
\nfor line endings i.e.-----BEGIN ENCRYPTED PRIVATE KEY-----\nMII...\n...\n-----END ENCRYPTED PRIVATE KEY-----
- The multi-line key needs to be converted to one line with
- Snowflake Passphrase Key (Optional): If you have configured the encrypted key pair authentication for the given user you will have to pass the paraphrase associated with the private key in this field. You can checkout this doc to get more details about key-pair authentication.
- Include Temporary and Transient Tables:
Optional configuration for ingestion of
TRANSIENTandTEMPORARYtables, By default, it will skip theTRANSIENTandTEMPORARYtables. - Include Streams: Optional configuration for ingestion of streams, By default, it will skip the streams.
- Client Session Keep Alive: Optional Configuration to keep the session active in case the ingestion job runs for longer duration.
- Account Usage Schema Name: Full name of account usage schema, used in case your used do not have direct access to
SNOWFLAKE.ACCOUNT_USAGEschema. In such case you can replicate tablesQUERY_HISTORY,TAG_REFERENCES,PROCEDURES,FUNCTIONSto a custom schema let’s sayCUSTOM_DB.CUSTOM_SCHEMAand provide the same name in this field.
QUERY_HISTORY, TAG_REFERENCES, PROCEDURES, FUNCTIONS.