AI SDK

The AI SDK gives you programmatic access to Collate’s AI Studio — create personas and agents, invoke them via the API, and stream responses in real time. Available across Python, TypeScript, Java, and a standalone CLI.

You can find the source code for the AI SDK in the GitHub repository. Contributions are always welcome!

Available SDKs

SDK	Package	Install
Python	`data-ai-sdk`	`pip install data-ai-sdk`
TypeScript	`@openmetadata/ai-sdk`	`npm install @openmetadata/ai-sdk`
Java	`org.open-metadata:ai-sdk`	Maven / Gradle
CLI	`ai-sdk`	Install script

Prerequisites

You need:

A Collate instance with AI Studio Agents enabled.
A Bot JWT token for API authentication.

To get a JWT token, go to Settings > Bots in your Collate instance, select your bot, and copy the token. For more information, See How to get the JWT Token.

Configuration

Set the following environment variables:

export AI_SDK_HOST="https://your-org.getcollate.io"
export AI_SDK_TOKEN="your-bot-jwt-token"

All environment variables:

Variable	Required	Default	Description
`AI_SDK_HOST`	Yes	-	Your Collate server URL
`AI_SDK_TOKEN`	Yes	-	Bot JWT token
`AI_SDK_TIMEOUT`	No	`120`	Request timeout in seconds
`AI_SDK_VERIFY_SSL`	No	`true`	Verify SSL certificates
`AI_SDK_MAX_RETRIES`	No	`3`	Number of retry attempts
`AI_SDK_RETRY_DELAY`	No	`1.0`	Base delay between retries (seconds)

Client Initialization

from ai_sdk import AISdk, AISdkConfig

# From environment variables
config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# Or directly
client = AISdk(
    host="https://your-org.getcollate.io",
    token="your-bot-jwt-token",
)

Manage Personas

A persona defines the behavioral instructions and personality of an AI Studio Agent. Each persona contains a system prompt that shapes how the agent responds. Multiple agents can share the same persona.

Create a Persona

from ai_sdk.models import CreatePersonaRequest

persona = client.create_persona(CreatePersonaRequest(
    name="DataAnalyst",
    description="A meticulous data analyst focused on data quality",
    prompt=(
        "You are an expert data analyst working with Collate. "
        "You specialize in analyzing table schemas, identifying data quality issues, "
        "and recommending appropriate tests. Always reference specific columns and "
        "provide actionable recommendations."
    ),
    display_name="Data Analyst",
))
print(f"Created persona: {persona.name}")

Persona fields:

Field	Type	Required	Description
`name`	string	Yes	Unique identifier (alphanumeric, no spaces)
`description`	string	Yes	Role and behavior description
`prompt`	string	Yes	System prompt prepended to every agent conversation
`display_name`	string	No	Human-readable name (defaults to `name`)
`provider`	string	No	Default LLM provider: `openai`, `anthropic`, `azure_openai` (default: `openai`)

List Personas

personas = client.list_personas()
for persona in personas:
    print(f"{persona.name}: {persona.description}")

Get a Persona by Name

persona = client.get_persona("DataAnalyst")
print(f"{persona.name}: {persona.prompt[:80]}...")

Manage Agents

An agent combines a persona’s behavioral instructions with Collate’s MCP tools to form a purpose-built AI assistant. Agents must be API-enabled to be invoked via the SDK.

Create an Agent

from ai_sdk.models import CreateAgentRequest

agent = client.create_agent(CreateAgentRequest(
    name="DataQualityPlannerAgent",
    description="Analyzes tables and recommends data quality tests",
    persona="DataAnalyst",
    display_name="Data Quality Planner",
    api_enabled=True,
    abilities=[
        "search_metadata",
        "get_entity_details",
        "get_entity_lineage",
    ],
))
print(f"Created agent: {agent.name}")

Agent fields:

Field	Type	Required	Description
`name`	string	Yes	Unique identifier (alphanumeric, PascalCase/camelCase)
`description`	string	Yes	Purpose shown in AI Studio
`persona`	string	Yes	Name of an existing persona
`display_name`	string	No	Human-readable name (defaults to `name`)
`api_enabled`	boolean	No	Must be `true` for SDK invocation (default: `false`)
`abilities`	array	No	Allowed MCP tool names (all tools if omitted)
`prompt`	string	No	Additional system prompt appended to persona’s base prompt
`provider`	string	No	LLM provider: `openai`, `anthropic`, `azure_openai` (default: `openai`)
`bot_name`	string	No	Collate bot for metadata operations

Available abilities: search_metadata, get_entity_details, get_entity_lineage, create_glossary, create_glossary_term, create_lineage, patch_entity

List Agents

agents = client.list_agents()
for agent in agents:
    print(f"{agent.name}: {agent.description}")
    print(f"  Abilities: {', '.join(agent.abilities)}")

Invoke an Agent

Send a message to an API-enabled agent and receive a response.

Single Invocation

response = client.agent("DataQualityPlannerAgent").call(
    "What data quality tests should I add for the customers table?"
)
print(response.response)
print(f"Tools used: {response.tools_used}")

The response includes:

Field	Type	Description
`conversation_id`	string	Use for multi-turn follow-ups
`response`	string	The agent’s text response
`tools_used`	array	MCP tools the agent invoked
`usage`	object	Token usage (`prompt_tokens`, `completion_tokens`, `total_tokens`)

Streaming

Use streaming to receive real-time output as the agent generates its response.

for event in client.agent("DataQualityPlannerAgent").stream(
    "Analyze the orders table"
):
    match event.type:
        case "start":
            print(f"Started conversation: {event.conversation_id}")
        case "content":
            print(event.content, end="", flush=True)
        case "tool_use":
            print(f"\n[Using tool: {event.tool_name}]")
        case "end":
            print("\nDone!")

Stream event types:

Type	Fields	Description
`start`	`conversation_id`	Agent started processing
`content`	`content`	Text chunk from the response
`tool_use`	`tool_name`	Agent is invoking an MCP tool
`end`	-	Response complete
`error`	`error`	An error occurred

Multi-Turn Conversations

The Conversation class automatically manages context across messages.

from ai_sdk import Conversation

conv = Conversation(client.agent("DataQualityPlannerAgent"))

# Each call automatically carries the conversation context
print(conv.send("Analyze the customers table"))
print(conv.send("Create tests for the issues you found"))
print(conv.send("Show me the SQL for those tests"))

# Access conversation details
print(f"Turns: {len(conv)}")
print(f"Conversation ID: {conv.id}")

Async Support (Python)

All sync methods have async counterparts with the a prefix:

Sync	Async
`agent.call()`	`await agent.acall()`
`agent.stream()`	`async for event in agent.astream()`
`conv.send()`	`await conv.asend()`

import asyncio
from ai_sdk import AISdk, AISdkConfig

async def main():
    config = AISdkConfig.from_env(enable_async=True)
    client = AISdk.from_config(config)

    response = await client.agent("DataQualityPlannerAgent").acall(
        "Analyze the customers table"
    )
    print(response.response)

asyncio.run(main())

Error Handling

Code	Exception	Description
`401`	`AuthenticationError`	Invalid or expired JWT token
`403`	`AgentNotEnabledError`	Agent exists but is not API-enabled
`404`	`AgentNotFoundError`	No agent with the given name exists
`409`	`CONFLICT`	Agent or persona with the same name already exists
`429`	`RateLimitError`	Too many requests — retry after the indicated delay
`500`	`AgentExecutionError`	Internal error during agent execution

from ai_sdk.exceptions import (
    AuthenticationError,
    AgentNotFoundError,
    AgentNotEnabledError,
    RateLimitError,
    AgentExecutionError,
)

try:
    response = client.agent("MyAgent").call("Hello")

except AuthenticationError:
    print("Invalid or expired token. Check your AI_SDK_TOKEN.")

except AgentNotFoundError as e:
    print(f"Agent not found: {e.agent_name}")

except AgentNotEnabledError as e:
    print(f"Agent '{e.agent_name}' is not API-enabled. Enable it in AI Studio.")

except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.retry_after} seconds")

except AgentExecutionError as e:
    print(f"Agent execution failed: {e.message}")

CLI

# Install
curl -sSL https://raw.githubusercontent.com/open-metadata/ai-sdk/main/cli/install.sh | sh

# Configure
ai-sdk configure

# Invoke an agent
ai-sdk invoke DataQualityPlannerAgent "Analyze the customers table"

The CLI provides an interactive TUI with markdown rendering and syntax highlighting.

MCP Tools

Collate exposes an MCP server that turns your metadata into a set of tools any LLM can use. Unlike generic MCP connectors that only read raw database schemas, Collate’s MCP tools give your AI access to the full context of your data platform — descriptions, owners, lineage, glossary terms, tags, and data quality results. The MCP endpoint is available at POST /mcp using the JSON-RPC 2.0 protocol.

Available Tools

Tool	Description
`search_metadata`	Search across all metadata in Collate (tables, dashboards, pipelines, topics, etc.)
`semantic_search`	AI-powered semantic search that understands meaning and context beyond keyword matching
`get_entity_details`	Get detailed information about a specific entity by ID or fully qualified name
`get_entity_lineage`	Get upstream and downstream lineage for an entity
`create_glossary`	Create a new glossary in Collate
`create_glossary_term`	Create a new term within an existing glossary
`create_lineage`	Create a lineage edge between two entities
`patch_entity`	Update an entity’s metadata (description, tags, owners, etc.)
`get_test_definitions`	List available data quality test definitions
`create_test_case`	Create a data quality test case for an entity
`root_cause_analysis`	Analyze root causes of data quality failures

Using MCP Tools Directly

You can call MCP tools directly through the SDK client:

from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# List available tools
tools = client.mcp.list_tools()
for tool in tools:
    print(f"{tool.name}: {tool.description}")

# Search for tables
result = client.mcp.call_tool("search_metadata", {
    "query": "customers",
    "entity_type": "table",
    "limit": 5,
})
print(result.data)

# Get entity details
result = client.mcp.call_tool("get_entity_details", {
    "fqn": "warehouse.production.public.customers",
    "entity_type": "table",
})
print(result.data)

# Get lineage
result = client.mcp.call_tool("get_entity_lineage", {
    "entity_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "upstream_depth": 3,
    "downstream_depth": 2,
})
print(result.data)

LangChain Integration

Convert Collate’s MCP tools to LangChain format with a single method call. This lets you use your metadata as tools in any LangChain agent.

pip install data-ai-sdk[langchain]

from ai_sdk import AISdk, AISdkConfig
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# Convert MCP tools to LangChain format
tools = client.mcp.as_langchain_tools()

llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a metadata assistant powered by Collate."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "Find tables related to customers and show their lineage"
})
print(result["output"])

Tool Filtering

Control which tools are exposed to your LLM by including or excluding specific tools. This is useful for restricting agents to read-only operations or limiting scope.

from ai_sdk.mcp.models import MCPTool

# Only include read-only tools
tools = client.mcp.as_langchain_tools(
    include=[
        MCPTool.SEARCH_METADATA,
        MCPTool.SEMANTIC_SEARCH,
        MCPTool.GET_ENTITY_DETAILS,
        MCPTool.GET_ENTITY_LINEAGE,
        MCPTool.GET_TEST_DEFINITIONS,
    ]
)

# Or exclude mutation tools
tools = client.mcp.as_langchain_tools(
    exclude=[MCPTool.PATCH_ENTITY, MCPTool.CREATE_GLOSSARY, MCPTool.CREATE_GLOSSARY_TERM]
)

Using AI Studio Agents as LangChain Tools

You can wrap AI Studio Agents as LangChain tools, letting you compose them with other tools in a LangChain pipeline:

from ai_sdk.integrations.langchain import AISdkAgentTool, create_ai_sdk_tools

# Create a tool from a single agent
tool = AISdkAgentTool.from_client(client, "DataQualityPlannerAgent")

# Create tools for multiple agents
tools = create_ai_sdk_tools(client, [
    "DataQualityPlannerAgent",
    "SqlQueryAgent",
    "LineageExplorerAgent",
])

# Or create tools for all API-enabled agents
tools = create_ai_sdk_tools(client)

Multi-Agent Orchestrator

Build a multi-agent system where specialist agents each get focused MCP tools:

from ai_sdk.mcp.models import MCPTool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

# Discovery specialist — search and read operations
discovery_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.SEMANTIC_SEARCH,
    MCPTool.SEARCH_METADATA,
    MCPTool.GET_ENTITY_DETAILS,
])

# Lineage specialist — lineage exploration
lineage_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.GET_ENTITY_LINEAGE,
    MCPTool.GET_ENTITY_DETAILS,
])

# Curator specialist — write operations
curator_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.GET_ENTITY_DETAILS,
    MCPTool.PATCH_ENTITY,
    MCPTool.CREATE_GLOSSARY_TERM,
])

llm = ChatOpenAI(model="gpt-4o")

def create_specialist(tools, system_prompt):
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ])
    agent = create_tool_calling_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

discovery = create_specialist(discovery_tools, "You are a data discovery specialist.")
lineage = create_specialist(lineage_tools, "You are a lineage exploration specialist.")
curator = create_specialist(curator_tools, "You are a metadata curation specialist.")

OpenAI Integration

Convert MCP tools to OpenAI function calling format:

import json
from openai import OpenAI
from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
om_client = AISdk.from_config(config)
openai_client = OpenAI()

tools = om_client.mcp.as_openai_tools()
executor = om_client.mcp.create_tool_executor()

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Find customer tables"}],
    tools=tools,
)

message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        result = executor(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        print(f"Tool: {tool_call.function.name}")
        print(f"Result: {result}")

SDKs and APIs

Documentation Index

​AI SDK

​Available SDKs

​Prerequisites

​Configuration

​Client Initialization

​Manage Personas

​Create a Persona

​List Personas

​Get a Persona by Name

​Manage Agents

​Create an Agent

​List Agents

​Invoke an Agent

​Single Invocation

​Streaming

​Multi-Turn Conversations

​Async Support (Python)

​Error Handling

​CLI

​MCP Tools

​Available Tools

​Using MCP Tools Directly

​LangChain Integration

​Tool Filtering

​Using AI Studio Agents as LangChain Tools

​Multi-Agent Orchestrator

​OpenAI Integration

AI SDK

Available SDKs

Prerequisites

Configuration

Client Initialization

Manage Personas

Create a Persona

List Personas

Get a Persona by Name

Manage Agents

Create an Agent

List Agents

Invoke an Agent

Single Invocation

Streaming

Multi-Turn Conversations

Async Support (Python)

Error Handling

CLI

MCP Tools

Available Tools

Using MCP Tools Directly

LangChain Integration

Tool Filtering

Using AI Studio Agents as LangChain Tools

Multi-Agent Orchestrator

OpenAI Integration