> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getcollate.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Auto-Classification Workflow

> Learn how Collate automatically detects and tags sensitive data like PII using column name scanning and NLP-based entity recognition.

# Overview

Auto-Classification is a Collate workflow that automatically detects and tags sensitive data — such as PII — across your database columns. It removes the need for manual tagging by scanning both column names and sample data during ingestion, then applying or suggesting tags like `PII.Sensitive` and `PII.NonSensitive`.

## How It Works

Auto-Classification uses two complementary detection approaches:

* **Column Name Scanner**: Validates column names against a set of regex rules that identify common sensitive patterns — email addresses, names, SSNs, bank account numbers, and similar fields.

  For example, columns `email` and `full_name` are auto-tagged as `PII.Sensitive` based on their column names.

  <img src="https://mintcdn.com/collatedocs/jJd-gnrs9XdFvoiw/public/images/how-to-guides/governance/auto-pii1.png?fit=max&auto=format&n=jJd-gnrs9XdFvoiw&q=85&s=b36fd7e2099657f99b7a8bacacf6307c" alt="Columns with recognizable sensitive names auto-tagged as PII Sensitive" width="3012" height="1288" data-path="public/images/how-to-guides/governance/auto-pii1.png" />

* **Entity Recognition**: If sample data ingestion is enabled, scans the actual row values using an NLP-based entity recognition engine. This catches sensitive data even when the column name is generic or ambiguous. The `confidence` parameter (0–100, default `80`) controls the minimum score required to tag a column as `PII.Sensitive`.

  If a column already has a `PII` tag, it is skipped during execution.

  For example, the column `I_FORMULATION` is also tagged as `PII.Sensitive`, even though its name gives no indication of sensitive content.

  <img src="https://mintcdn.com/collatedocs/jJd-gnrs9XdFvoiw/public/images/how-to-guides/governance/auto-pii2.png?fit=max&auto=format&n=jJd-gnrs9XdFvoiw&q=85&s=c5085dceac17954d2808453250032c4d" alt="Column with an ambiguous name tagged as PII Sensitive" width="2848" height="1102" data-path="public/images/how-to-guides/governance/auto-pii2.png" />

  Inspecting the **Sample Data** tab reveals that the actual row values contain sensitive information, which the entity recognition engine detected. This shows that auto-classification works beyond column names and relies on the data itself when sample ingestion is enabled.

  <img src="https://mintcdn.com/collatedocs/jJd-gnrs9XdFvoiw/public/images/how-to-guides/governance/auto-pii3.png?fit=max&auto=format&n=jJd-gnrs9XdFvoiw&q=85&s=8fca486f0d9904257167f2e2bc7d4d99" alt="Sample data showing sensitive values that triggered auto-classification" width="2822" height="1446" data-path="public/images/how-to-guides/governance/auto-pii3.png" />

## Glossary Term Associated Tags

Separate from the auto-classification workflow, Collate can derive classification tags from glossary terms. If a glossary term has associated classification tags, applying that glossary term to an asset also applies the associated tags as derived tags.

For example, if the glossary term `Account` has `PII.Sensitive` associated with it, adding the `Account` glossary term to a table or column also adds `PII.Sensitive`. This behavior is configured on glossary terms; it is not generic classification-tag-to-classification-tag mapping.

## Set Up Auto-Classification

<CardGroup cols={2}>
  <Card title="Workflow" href="/how-to-guides/data-governance/classification/auto-classification/workflow">
    Add an Auto Classification Agent to a database service directly from the Collate UI.
  </Card>

  <Card title="External Workflow" href="/how-to-guides/data-governance/classification/auto-classification/external-workflow">
    Run the Auto Classification Workflow externally using a YAML pipeline configuration.
  </Card>

  <Card title="Auto PII Tagging" href="/how-to-guides/data-governance/classification/auto-classification/auto-pii-tagging">
    Understand the tagging logic and troubleshoot common issues like SSL certificate errors.
  </Card>

  <Card title="Custom Recognizers" href="/how-to-guides/data-governance/classification/auto-classification/recognizers">
    Define custom rules to detect and tag sensitive data using regex patterns, exact terms, or pre-built detectors.
  </Card>

  <Card title="Tag Feedback and Approvals" href="/how-to-guides/data-governance/classification/auto-classification/feedback">
    Report false positives on auto-applied tags and manage approval workflows to continuously improve classification accuracy.
  </Card>

  <Card title="Sample Data" href="/how-to-guides/data-governance/classification/auto-classification/external-sample-data">
    Store sample data collected during auto-classification to an S3 bucket in Parquet format.
  </Card>
</CardGroup>
