Skip to main content

Amazon Redshift Connector

The Amazon Redshift connector provides comprehensive integration with AWS’s cloud data warehouse service, offering deep metadata discovery, query analysis, and seamless authentication with AWS services.

Supported Features

  • Metadata Discovery: Complete schema, table, and column discovery
  • Query Usage: Query logs analysis and usage patterns
  • Data Profiler: Statistical analysis and data profiling
  • Data Quality: Custom quality tests and continuous monitoring
  • dbt Integration: Native dbt model and lineage support
  • Data Lineage: Table and view relationship tracking
  • Column-level Lineage: Detailed column dependency mapping
  • Stored Procedures: Procedure discovery and metadata
  • Sample Data: Data sampling for analysis and preview
  • Auto-Classification: Automated data classification and PII detection
  • Incremental Extraction: Efficient incremental metadata updates
  • SSL Support: Secure connections with SSL/TLS

Limitations

  • Owners: Owner information extraction not supported
  • Tags: Tag metadata extraction not supported
In this section, we provide guides and references to use the Redshift connector. Configure and schedule Redshift metadata and profiler workflows from the Collate UI: Collate Only Features:

Requirements

Metadata

Redshift user must grant SELECT privilege on table SVV_TABLE_INFO to fetch the metadata of tables and views. For more information visit here.
-- Create a new user
-- More details https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_USER.html
CREATE USER test_user with PASSWORD 'password';

-- Grant SELECT on table
GRANT SELECT ON TABLE svv_table_info to test_user;
Ensure that the ingestion user has USAGE privileges on the schema containing the views. If additional access is needed, run the following command:
GRANT USAGE ON SCHEMA "<schema_name>" TO <ingestion_user>;

Profiler & Data Quality

Executing the profiler workflow or data quality tests, will require the user to have SELECT permission on the tables/schemas where the profiler/tests will be executed. More information on the profiler workflow setup can be found here and data quality tests here.

Usage & Lineage

For the usage and lineage workflow, the user will need SELECT privilege on STL_QUERY table. You can find more information on the usage workflow here and the lineage workflow here.

Metadata Ingestion

All connectors are defined as JSON Schemas. Here you can find the structure to create a connection to Redshift.

1. Visit the Services Page

The first step is to ingest the metadata from your sources. Under Settings, you will find a Services link an external source system to Collate. Once a service is created, it can be used to configure metadata, usage, and profiler workflows. To visit the Services page, select Services from the Settings menu.

2. Create a New Service

Click on the Add New Service button to start the Service creation.

3. Select the Service Type

Select Redshift as the service type and click Next.

4. Name and Describe your Service

Provide a name and description for your service as illustrated below. It is recommended to exclude the schema “information_schema” from the metadata ingestion as it contains system tables and views.

Connection Details

  • Username: Specify the User to connect to Redshift. It should have enough privileges to read all the metadata.
  • Password: Password to connect to Redshift.
  • Database: The database of the data source is an optional parameter, if you would like to restrict the metadata reading to a single database. If left blank, Collate ingestion attempts to scan all the databases.
During the metadata ingestion for redshift, the tables in which the distribution style i.e, DISTSTYLE is not AUTO will be marked as partitioned tables
SSL Configuration In order to integrate SSL in the Metadata Ingestion Config, the user will have to add the SSL config under connectionArguments which is placed in the source. SSL Modes There are a couple of types of SSL modes that Redshift supports which can be added to ConnectionArguments, they are as follows:
  • disable: SSL is disabled and the connection is not encrypted.
  • allow: SSL is used if the server requires it.
  • prefer: SSL is used if the server supports it. Amazon Redshift supports SSL, so SSL is used when you set sslmode to prefer.
  • require: SSL is required.
  • verify-ca: SSL must be used and the server certificate must be verified.
  • verify-full: SSL must be used. The server certificate must be verified and the server hostname must match the hostname attribute on the certificate.
For more information, you can visit Redshift SSL documentation

Securing Redshift Connection with SSL in Collate

To establish secure connections between Collate and a Redshift database, you can configure SSL using different SSL modes provided by Redshift, each offering varying levels of security. Under Advanced Config, specify the SSL mode appropriate for your connection, such as prefer, verify-ca, allow, and others. After selecting the SSL mode, provide the CA certificate used for SSL validation (caCertificate). Note that Redshift requires only the CA certificate for SSL validation.

Reverse Metadata

This feature is available in Collate only. The Reverse Metadata feature allows you to push metadata changes from Collate back to Redshift, ensuring your source system stays synchronized with any updates made in the data catalog.