Apache Iceberg support in Collate
Apache Iceberg is an open table format that makes it easier to store and query large amounts of data in a data lake. If you’re running analytics at any meaningful scale, Iceberg is likely somewhere in your stack, whether you set it up intentionally or inherited it through a platform like Snowflake or Databricks. Collate’s approach to Iceberg is worth explaining because the design decision isn’t intuitive at first glance: Collate doesn’t require a separate Iceberg ingestion workflow for the support described here. Instead, Iceberg tables are surfaced through the supported database connectors you’re already using.Background: early Iceberg catalog support
In the early days of Iceberg adoption, teams needed special tools just to read Iceberg tables. Here’s how the story evolved:- Early access via query engines: Tools like Trino were among the first to let teams query Iceberg data without working directly with the raw files.
- Cloud catalogs joined in: AWS Glue became a popular way to manage Iceberg tables for teams on AWS.
- A standard emerged: The Iceberg REST Catalog API became the common way for engines to connect to Iceberg. Platforms like Athena, Snowflake, and BigQuery all added their own support.
The design shift: use your connector, not the format
Most teams access Iceberg through a query engine or catalog service, such as Snowflake, Trino, Databricks, Athena, ClickHouse, Doris, StarRocks, or Glue. Those systems handle the complexity of reading Iceberg tables for you. Those systems already know how to:- Connect to your Iceberg catalog
- Read table and column details
- Filter and scan data efficiently
Support matrix
The connectors below can surface Iceberg-backed tables through their normal metadata workflows. Connectors marked ✅ under AssignedIceberg table type automatically assign the Iceberg table type in the catalog. For connectors marked Not yet supported, Collate can ingest the table through the connector but doesn’t automatically label it as Iceberg.
What you see after ingestion
Once your connector is set up and ingestion has run, Iceberg-backed tables appear in the Collate Explore page alongside your other data assets. For connectors that support automatic table-type assignment, Collate labels those tables with theIceberg table type. For connectors where table-type assignment is not yet supported, validate the table type in your source system.
From there, each connector’s supported workflows work the same way they do for any other table. Depending on the connector, those workflows can include:
- Data observability: Monitor freshness and detect anomalies
- Data profiling: Understand the shape and distribution of your data
- Data quality tests: Run checks directly against live Iceberg data
- Lineage tracking: See how data flows across schemas and downstream assets
- Usage statistics: Understand how tables are being queried
What this means in practice
If you already have Iceberg tables in your environment and you’re using one of the connectors in the support matrix, start with that existing connector:- Connect your query engine or catalog service using the standard connector setup.
- Run metadata ingestion.
- Review the ingested tables in Explore.
- Check whether your connector supports automatic
Icebergtable-type assignment.
Next steps
- Choose a supported connector to start ingesting Iceberg tables
- Set up data quality tests on your Iceberg-backed tables
- Configure lineage workflows across your data lake