MCP Tools

Collate exposes an MCP server at /mcp that turns your metadata into a set of tools any LLM can use. Unlike generic MCP connectors that only read raw database schemas, Collate’s MCP tools give your AI access to the full context of your data platform — descriptions, owners, lineage, glossary terms, tags, and data quality results.

Available Tools

Tool	Description
`search_metadata`	Search across all metadata in Collate (tables, dashboards, pipelines, topics, etc.)
`semantic_search`	AI-powered semantic search that understands meaning and context beyond keyword matching
`get_entity_details`	Get detailed information about a specific entity by ID or fully qualified name
`get_entity_lineage`	Get upstream and downstream lineage for an entity
`create_glossary`	Create a new glossary in Collate
`create_glossary_term`	Create a new term within an existing glossary
`create_lineage`	Create a lineage edge between two entities
`patch_entity`	Update an entity’s metadata (description, tags, owners, etc.)
`get_test_definitions`	List available data quality test definitions
`create_test_case`	Create a data quality test case for an entity
`root_cause_analysis`	Analyze root causes of data quality failures

Tool Call Parameters

tool_name

string

required

Name of the MCP tool to invoke. Must be one of the available tools listed above.

arguments

object

required

Tool-specific arguments. Each tool accepts different parameters.

Show search_metadata

query

string

required

Search query string.

entity_type

string

Filter by entity type: table, dashboard, pipeline, topic, mlmodel, container, searchIndex.

limit

integer

default:"10"

Maximum number of results to return.

Show get_entity_details

entity_id

string

UUID of the entity. Provide either entity_id or fqn.

fqn

string

Fully qualified name of the entity. Provide either entity_id or fqn.

entity_type

string

Entity type (required when using fqn).

Show get_entity_lineage

entity_id

string

required

UUID of the entity to get lineage for.

upstream_depth

integer

default:"1"

Number of upstream hops to include.

downstream_depth

integer

default:"1"

Number of downstream hops to include.

Show create_glossary

name

string

required

Name of the glossary.

description

string

required

Description of the glossary.

owners

array

List of owner references.

Show create_glossary_term

glossary

string

required

Name of the parent glossary.

name

string

required

Name of the glossary term.

description

string

required

Description of the glossary term.

Show create_lineage

from_entity_type

string

required

Entity type of the source (e.g., table, pipeline).

from_entity_id

string

required

UUID of the source entity.

to_entity_type

string

required

Entity type of the target.

to_entity_id

string

required

UUID of the target entity.

description

string

Description of the lineage relationship.

Show patch_entity

entity_type

string

required

Entity type to patch (e.g., table, dashboard).

entity_id

string

required

UUID of the entity.

operations

array

required

JSON Patch operations (RFC 6902).

Show semantic_search

query

string

required

Natural language query. The search understands meaning and context beyond exact keyword matching.

entity_type

string

Filter by entity type: table, dashboard, pipeline, topic, mlmodel, container, searchIndex.

limit

integer

default:"10"

Maximum number of results to return.

Show get_test_definitions

entity_type

string

Filter test definitions by entity type (e.g., table, column).

limit

integer

default:"10"

Maximum number of definitions to return.

Show create_test_case

entity_fqn

string

required

Fully qualified name of the entity to test.

test_definition

string

required

Name of the test definition to use (from get_test_definitions).

name

string

required

Name for the test case.

parameters

object

Test-specific parameters (depends on the test definition).

Show root_cause_analysis

test_case_id

string

required

UUID of the failed test case to analyze.

POST /mcp

from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# List available tools
tools = client.mcp.list_tools()
for tool in tools:
    print(f"{tool.name}: {tool.description}")

# Search for tables
result = client.mcp.call_tool("search_metadata", {
    "query": "customers",
    "entity_type": "table",
    "limit": 5,
})
print(result)

# Get entity details
result = client.mcp.call_tool("get_entity_details", {
    "fqn": "warehouse.production.public.customers",
    "entity_type": "table",
})
print(result)

# Get entity lineage
result = client.mcp.call_tool("get_entity_lineage", {
    "entity_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "upstream_depth": 3,
    "downstream_depth": 2,
})
print(result)

{
  "jsonrpc": "2.0",
  "result": {
    "content": [
      {
        "type": "text",
        "text": "[{\"name\": \"customers\", \"fullyQualifiedName\": \"warehouse.production.public.customers\", \"entityType\": \"table\", \"description\": \"Core customer records\", \"owners\": [\"data-engineering\"], \"tags\": [\"PII\", \"Tier.Tier1\"]}]"
      }
    ]
  },
  "id": 1
}

Response

jsonrpc

string

JSON-RPC version (always "2.0").

result

object

Tool execution result.

Show properties

content

array

Array of content blocks returned by the tool.

Show properties

type

string

Content type (typically "text").

text

string

The tool’s output, usually a JSON string.

integer

Request ID matching the original call.

LangChain Integration

Convert Collate’s MCP tools to LangChain format with a single method call. This lets you use your metadata as tools in any LangChain agent.

pip install data-ai-sdk[langchain]

from ai_sdk import AISdk, AISdkConfig
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# Convert MCP tools to LangChain format
tools = client.mcp.as_langchain_tools()

llm = ChatOpenAI(model="gpt-4")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a metadata assistant powered by Collate."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "Find tables related to customers and show their lineage"
})
print(result["output"])

OpenAI Integration

Convert MCP tools to OpenAI function calling format for use with the OpenAI SDK:

import json
from openai import OpenAI
from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
om_client = AISdk.from_config(config)
openai_client = OpenAI()

tools = om_client.mcp.as_openai_tools()
executor = om_client.mcp.create_tool_executor()

response = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Find customer tables"}],
    tools=tools,
)

message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        result = executor(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        print(f"Tool: {tool_call.function.name}")
        print(f"Result: {result}")

Tool Filtering

Control which tools are exposed to your LLM by including or excluding specific tools. This is useful for restricting agents to read-only operations or limiting scope.

from ai_sdk.mcp.models import MCPTool

# Only include read-only tools
tools = client.mcp.as_langchain_tools(
    include=[
        MCPTool.SEARCH_METADATA,
        MCPTool.SEMANTIC_SEARCH,
        MCPTool.GET_ENTITY_DETAILS,
        MCPTool.GET_ENTITY_LINEAGE,
        MCPTool.GET_TEST_DEFINITIONS,
    ]
)

# Or exclude mutation tools
tools = client.mcp.as_langchain_tools(
    exclude=[MCPTool.PATCH_ENTITY, MCPTool.CREATE_GLOSSARY, MCPTool.CREATE_GLOSSARY_TERM]
)

The same filtering works with OpenAI tools:

tools = client.mcp.as_openai_tools(
    include=[MCPTool.SEARCH_METADATA, MCPTool.GET_ENTITY_DETAILS]
)

Using Agents as LangChain Tools

You can also wrap AI Studio Agents as LangChain tools, letting you compose them with other tools in a LangChain pipeline:

from ai_sdk.integrations.langchain import AISdkAgentTool, create_ai_sdk_tools

# Create a tool from a single agent
tool = AISdkAgentTool.from_client(client, "DataQualityPlannerAgent")

# Create tools for multiple agents
tools = create_ai_sdk_tools(client, [
    "DataQualityPlannerAgent",
    "SqlQueryAgent",
    "LineageExplorerAgent",
])

# Or create tools for all API-enabled agents
tools = create_ai_sdk_tools(client)

Error Handling

Code	Error Type	Description
`400`	`BAD_REQUEST`	Invalid tool name or missing required arguments
`401`	`UNAUTHORIZED`	Invalid or missing JWT token
`403`	`FORBIDDEN`	User lacks permission to execute the requested tool
`404`	`NOT_FOUND`	Referenced entity does not exist
`500`	`INTERNAL_SERVER_ERROR`	Internal error during tool execution

Introduction

AI SDK

OpenMetadata Standards

Java

Data Assets

Discovery

Lineage

Data Contracts

Teams & Users

Data Governance

Data Quality

MCP Tools

MCP Tools

Available Tools

Tool Call Parameters

Response

LangChain Integration

OpenAI Integration

Tool Filtering

Using Agents as LangChain Tools

Error Handling

Introduction

AI SDK

OpenMetadata Standards

Java

Data Assets

Discovery

Lineage

Data Contracts

Teams & Users

Data Governance

Data Quality

​MCP Tools

​Available Tools

​Tool Call Parameters

​Response

​LangChain Integration

​OpenAI Integration

​Tool Filtering

​Using Agents as LangChain Tools

​Error Handling

MCP Tools

Available Tools

Tool Call Parameters

Response

LangChain Integration

OpenAI Integration

Tool Filtering

Using Agents as LangChain Tools

Error Handling