Amazon Redshift Connector
The Amazon Redshift connector provides comprehensive integration with AWS’s cloud data warehouse service, offering deep metadata discovery, query analysis, and seamless authentication with AWS services.Supported Features
- ✅ Metadata Discovery: Complete schema, table, and column discovery
- ✅ Query Usage: Query logs analysis and usage patterns
- ✅ Data Profiler: Statistical analysis and data profiling
- ✅ Data Quality: Custom quality tests and continuous monitoring
- ✅ dbt Integration: Native dbt model and lineage support
- ✅ Data Lineage: Table and view relationship tracking
- ✅ Column-level Lineage: Detailed column dependency mapping
- ✅ Stored Procedures: Procedure discovery and metadata
- ✅ Sample Data: Data sampling for analysis and preview
- ✅ Auto-Classification: Automated data classification and PII detection
- ✅ Incremental Extraction: Efficient incremental metadata updates
- ✅ SSL Support: Secure connections with SSL/TLS
Limitations
- ❌ Owners: Owner information extraction not supported
- ❌ Tags: Tag metadata extraction not supported
- Requirements
- Metadata Ingestion
- Query Usage
- Data Profiler
- Data Quality
- Lineage
- dbt Integration
- Enable Security
- Troubleshooting
Requirements
Metadata
Redshift user must grantSELECT privilege on table SVV_TABLE_INFO to fetch the metadata of tables and views. For more information visit here.
Schema Usage Privileges
Schema Usage Privileges
Ensure that the ingestion user has USAGE privileges on the schema containing the views. If additional access is needed, run the following command:
Profiler & Data Quality
Executing the profiler workflow or data quality tests, will require the user to haveSELECT permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found here and data quality tests here.
Usage & Lineage
For the usage and lineage workflow, the user will needSELECT privilege on STL_QUERY table. You can find more information on the usage workflow here and the lineage workflow here.
Metadata Ingestion
All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Redshift.1. Visit the Services Page
The first step is to ingest the metadata from your sources. Under Settings, you will find a Services link an external source system to Collate. Once a service is created, it can be used to configure metadata, usage, and profiler workflows. To visit the Services page, select Services from the Settings menu.2. Create a New Service
Click on the Add New Service button to start the Service creation.3. Select the Service Type
Select Redshift as the service type and click Next.4. Name and Describe your Service
Provide a name and description for your service as illustrated below. It is recommended to exclude the schema “information_schema” from the metadata ingestion as it contains system tables and views.Connection Details
- Username: Specify the User to connect to Redshift. It should have enough privileges to read all the metadata.
- Password: Password to connect to Redshift.
- Database: The database of the data source is an optional parameter, if you would like to restrict the metadata reading to a single database. If left blank, Collate ingestion attempts to scan all the databases.
Distribution Style Note
Distribution Style Note
During the metadata ingestion for redshift, the tables in which the distribution style i.e,
DISTSTYLE is not AUTO will be marked as partitioned tables- disable: SSL is disabled and the connection is not encrypted.
- allow: SSL is used if the server requires it.
- prefer: SSL is used if the server supports it. Amazon Redshift supports SSL, so SSL is used when you set sslmode to prefer.
- require: SSL is required.
- verify-ca: SSL must be used and the server certificate must be verified.
- verify-full: SSL must be used. The server certificate must be verified and the server hostname must match the hostname attribute on the certificate.
Securing Redshift Connection with SSL in Collate
To establish secure connections between Collate and a Redshift database, you can configure SSL using different SSL modes provided by Redshift, each offering varying levels of security. UnderAdvanced Config, specify the SSL mode appropriate for your connection, such as prefer, verify-ca, allow, and others. After selecting the SSL mode, provide the CA certificate used for SSL validation (caCertificate). Note that Redshift requires only the CA certificate for SSL validation.