Skip to main content

AI SDK

The AI SDK gives you programmatic access to Collate’s AI Studio — create personas and agents, invoke them via the API, and stream responses in real time. Available across Python, TypeScript, Java, and a standalone CLI.
You can find the source code for the AI SDK in the GitHub repository. Contributions are always welcome!

Available SDKs

SDKPackageInstall
Pythondata-ai-sdkpip install data-ai-sdk
TypeScript@openmetadata/ai-sdknpm install @openmetadata/ai-sdk
Javaorg.open-metadata:ai-sdkMaven / Gradle
CLIai-sdkInstall script

Prerequisites

You need:
  1. A Collate instance with AI Studio Agents enabled
  2. A Bot JWT token for API authentication
To get a JWT token, go to Settings > Bots in your Collate instance, select your bot, and copy the token. See How to get the JWT Token for detailed instructions.

Configuration

Set the following environment variables:
export AI_SDK_HOST="https://your-org.getcollate.io"
export AI_SDK_TOKEN="your-bot-jwt-token"
All environment variables:
VariableRequiredDefaultDescription
AI_SDK_HOSTYes-Your Collate server URL
AI_SDK_TOKENYes-Bot JWT token
AI_SDK_TIMEOUTNo120Request timeout in seconds
AI_SDK_VERIFY_SSLNotrueVerify SSL certificates
AI_SDK_MAX_RETRIESNo3Number of retry attempts
AI_SDK_RETRY_DELAYNo1.0Base delay between retries (seconds)

Client Initialization

from ai_sdk import AISdk, AISdkConfig

# From environment variables
config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# Or directly
client = AISdk(
    host="https://your-org.getcollate.io",
    token="your-bot-jwt-token",
)

Manage Personas

A persona defines the behavioral instructions and personality of an AI Studio Agent. Each persona contains a system prompt that shapes how the agent responds. Multiple agents can share the same persona.

Create a Persona

from ai_sdk.models import CreatePersonaRequest

persona = client.create_persona(CreatePersonaRequest(
    name="DataAnalyst",
    description="A meticulous data analyst focused on data quality",
    prompt=(
        "You are an expert data analyst working with Collate. "
        "You specialize in analyzing table schemas, identifying data quality issues, "
        "and recommending appropriate tests. Always reference specific columns and "
        "provide actionable recommendations."
    ),
    display_name="Data Analyst",
))
print(f"Created persona: {persona.name}")
Persona fields:
FieldTypeRequiredDescription
namestringYesUnique identifier (alphanumeric, no spaces)
descriptionstringYesRole and behavior description
promptstringYesSystem prompt prepended to every agent conversation
display_namestringNoHuman-readable name (defaults to name)
providerstringNoDefault LLM provider: openai, anthropic, azure_openai (default: openai)

List Personas

personas = client.list_personas()
for persona in personas:
    print(f"{persona.name}: {persona.description}")

Get a Persona by Name

persona = client.get_persona("DataAnalyst")
print(f"{persona.name}: {persona.prompt[:80]}...")

Manage Agents

An agent combines a persona’s behavioral instructions with Collate’s MCP tools to form a purpose-built AI assistant. Agents must be API-enabled to be invoked via the SDK.

Create an Agent

from ai_sdk.models import CreateAgentRequest

agent = client.create_agent(CreateAgentRequest(
    name="DataQualityPlannerAgent",
    description="Analyzes tables and recommends data quality tests",
    persona="DataAnalyst",
    display_name="Data Quality Planner",
    api_enabled=True,
    abilities=[
        "search_metadata",
        "get_entity_details",
        "get_entity_lineage",
    ],
))
print(f"Created agent: {agent.name}")
Agent fields:
FieldTypeRequiredDescription
namestringYesUnique identifier (alphanumeric, PascalCase/camelCase)
descriptionstringYesPurpose shown in AI Studio
personastringYesName of an existing persona
display_namestringNoHuman-readable name (defaults to name)
api_enabledbooleanNoMust be true for SDK invocation (default: false)
abilitiesarrayNoAllowed MCP tool names (all tools if omitted)
promptstringNoAdditional system prompt appended to persona’s base prompt
providerstringNoLLM provider: openai, anthropic, azure_openai (default: openai)
bot_namestringNoCollate bot for metadata operations
Available abilities: search_metadata, get_entity_details, get_entity_lineage, create_glossary, create_glossary_term, create_lineage, patch_entity

List Agents

agents = client.list_agents()
for agent in agents:
    print(f"{agent.name}: {agent.description}")
    print(f"  Abilities: {', '.join(agent.abilities)}")

Invoke an Agent

Send a message to an API-enabled agent and receive a response.

Single Invocation

response = client.agent("DataQualityPlannerAgent").call(
    "What data quality tests should I add for the customers table?"
)
print(response.response)
print(f"Tools used: {response.tools_used}")
The response includes:
FieldTypeDescription
conversation_idstringUse for multi-turn follow-ups
responsestringThe agent’s text response
tools_usedarrayMCP tools the agent invoked
usageobjectToken usage (prompt_tokens, completion_tokens, total_tokens)

Streaming

Use streaming to receive real-time output as the agent generates its response.
for event in client.agent("DataQualityPlannerAgent").stream(
    "Analyze the orders table"
):
    match event.type:
        case "start":
            print(f"Started conversation: {event.conversation_id}")
        case "content":
            print(event.content, end="", flush=True)
        case "tool_use":
            print(f"\n[Using tool: {event.tool_name}]")
        case "end":
            print("\nDone!")
Stream event types:
TypeFieldsDescription
startconversation_idAgent started processing
contentcontentText chunk from the response
tool_usetool_nameAgent is invoking an MCP tool
end-Response complete
errorerrorAn error occurred

Multi-Turn Conversations

The Conversation class automatically manages context across messages.
from ai_sdk import Conversation

conv = Conversation(client.agent("DataQualityPlannerAgent"))

# Each call automatically carries the conversation context
print(conv.send("Analyze the customers table"))
print(conv.send("Create tests for the issues you found"))
print(conv.send("Show me the SQL for those tests"))

# Access conversation details
print(f"Turns: {len(conv)}")
print(f"Conversation ID: {conv.id}")

Async Support (Python)

All sync methods have async counterparts with the a prefix:
SyncAsync
agent.call()await agent.acall()
agent.stream()async for event in agent.astream()
conv.send()await conv.asend()
import asyncio
from ai_sdk import AISdk, AISdkConfig

async def main():
    config = AISdkConfig.from_env(enable_async=True)
    client = AISdk.from_config(config)

    response = await client.agent("DataQualityPlannerAgent").acall(
        "Analyze the customers table"
    )
    print(response.response)

asyncio.run(main())

Error Handling

CodeExceptionDescription
401AuthenticationErrorInvalid or expired JWT token
403AgentNotEnabledErrorAgent exists but is not API-enabled
404AgentNotFoundErrorNo agent with the given name exists
409CONFLICTAgent or persona with the same name already exists
429RateLimitErrorToo many requests — retry after the indicated delay
500AgentExecutionErrorInternal error during agent execution
from ai_sdk.exceptions import (
    AuthenticationError,
    AgentNotFoundError,
    AgentNotEnabledError,
    RateLimitError,
    AgentExecutionError,
)

try:
    response = client.agent("MyAgent").call("Hello")

except AuthenticationError:
    print("Invalid or expired token. Check your AI_SDK_TOKEN.")

except AgentNotFoundError as e:
    print(f"Agent not found: {e.agent_name}")

except AgentNotEnabledError as e:
    print(f"Agent '{e.agent_name}' is not API-enabled. Enable it in AI Studio.")

except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.retry_after} seconds")

except AgentExecutionError as e:
    print(f"Agent execution failed: {e.message}")

CLI

# Install
curl -sSL https://raw.githubusercontent.com/open-metadata/ai-sdk/main/cli/install.sh | sh

# Configure
ai-sdk configure

# Invoke an agent
ai-sdk invoke DataQualityPlannerAgent "Analyze the customers table"
The CLI provides an interactive TUI with markdown rendering and syntax highlighting.

MCP Tools

Collate exposes an MCP server that turns your metadata into a set of tools any LLM can use. Unlike generic MCP connectors that only read raw database schemas, Collate’s MCP tools give your AI access to the full context of your data platform — descriptions, owners, lineage, glossary terms, tags, and data quality results. The MCP endpoint is available at POST /mcp using the JSON-RPC 2.0 protocol.

Available Tools

ToolDescription
search_metadataSearch across all metadata in Collate (tables, dashboards, pipelines, topics, etc.)
semantic_searchAI-powered semantic search that understands meaning and context beyond keyword matching
get_entity_detailsGet detailed information about a specific entity by ID or fully qualified name
get_entity_lineageGet upstream and downstream lineage for an entity
create_glossaryCreate a new glossary in Collate
create_glossary_termCreate a new term within an existing glossary
create_lineageCreate a lineage edge between two entities
patch_entityUpdate an entity’s metadata (description, tags, owners, etc.)
get_test_definitionsList available data quality test definitions
create_test_caseCreate a data quality test case for an entity
root_cause_analysisAnalyze root causes of data quality failures

Using MCP Tools Directly

You can call MCP tools directly through the SDK client:
from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# List available tools
tools = client.mcp.list_tools()
for tool in tools:
    print(f"{tool.name}: {tool.description}")

# Search for tables
result = client.mcp.call_tool("search_metadata", {
    "query": "customers",
    "entity_type": "table",
    "limit": 5,
})
print(result.data)

# Get entity details
result = client.mcp.call_tool("get_entity_details", {
    "fqn": "warehouse.production.public.customers",
    "entity_type": "table",
})
print(result.data)

# Get lineage
result = client.mcp.call_tool("get_entity_lineage", {
    "entity_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "upstream_depth": 3,
    "downstream_depth": 2,
})
print(result.data)

LangChain Integration

Convert Collate’s MCP tools to LangChain format with a single method call. This lets you use your metadata as tools in any LangChain agent.
pip install data-ai-sdk[langchain]
from ai_sdk import AISdk, AISdkConfig
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

config = AISdkConfig.from_env()
client = AISdk.from_config(config)

# Convert MCP tools to LangChain format
tools = client.mcp.as_langchain_tools()

llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a metadata assistant powered by Collate."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = executor.invoke({
    "input": "Find tables related to customers and show their lineage"
})
print(result["output"])

Tool Filtering

Control which tools are exposed to your LLM by including or excluding specific tools. This is useful for restricting agents to read-only operations or limiting scope.
from ai_sdk.mcp.models import MCPTool

# Only include read-only tools
tools = client.mcp.as_langchain_tools(
    include=[
        MCPTool.SEARCH_METADATA,
        MCPTool.SEMANTIC_SEARCH,
        MCPTool.GET_ENTITY_DETAILS,
        MCPTool.GET_ENTITY_LINEAGE,
        MCPTool.GET_TEST_DEFINITIONS,
    ]
)

# Or exclude mutation tools
tools = client.mcp.as_langchain_tools(
    exclude=[MCPTool.PATCH_ENTITY, MCPTool.CREATE_GLOSSARY, MCPTool.CREATE_GLOSSARY_TERM]
)

Using AI Studio Agents as LangChain Tools

You can wrap AI Studio Agents as LangChain tools, letting you compose them with other tools in a LangChain pipeline:
from ai_sdk.integrations.langchain import AISdkAgentTool, create_ai_sdk_tools

# Create a tool from a single agent
tool = AISdkAgentTool.from_client(client, "DataQualityPlannerAgent")

# Create tools for multiple agents
tools = create_ai_sdk_tools(client, [
    "DataQualityPlannerAgent",
    "SqlQueryAgent",
    "LineageExplorerAgent",
])

# Or create tools for all API-enabled agents
tools = create_ai_sdk_tools(client)

Multi-Agent Orchestrator

Build a multi-agent system where specialist agents each get focused MCP tools:
from ai_sdk.mcp.models import MCPTool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate

# Discovery specialist — search and read operations
discovery_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.SEMANTIC_SEARCH,
    MCPTool.SEARCH_METADATA,
    MCPTool.GET_ENTITY_DETAILS,
])

# Lineage specialist — lineage exploration
lineage_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.GET_ENTITY_LINEAGE,
    MCPTool.GET_ENTITY_DETAILS,
])

# Curator specialist — write operations
curator_tools = client.mcp.as_langchain_tools(include=[
    MCPTool.GET_ENTITY_DETAILS,
    MCPTool.PATCH_ENTITY,
    MCPTool.CREATE_GLOSSARY_TERM,
])

llm = ChatOpenAI(model="gpt-4o")

def create_specialist(tools, system_prompt):
    prompt = ChatPromptTemplate.from_messages([
        ("system", system_prompt),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ])
    agent = create_tool_calling_agent(llm, tools, prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

discovery = create_specialist(discovery_tools, "You are a data discovery specialist.")
lineage = create_specialist(lineage_tools, "You are a lineage exploration specialist.")
curator = create_specialist(curator_tools, "You are a metadata curation specialist.")

OpenAI Integration

Convert MCP tools to OpenAI function calling format:
import json
from openai import OpenAI
from ai_sdk import AISdk, AISdkConfig

config = AISdkConfig.from_env()
om_client = AISdk.from_config(config)
openai_client = OpenAI()

tools = om_client.mcp.as_openai_tools()
executor = om_client.mcp.create_tool_executor()

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Find customer tables"}],
    tools=tools,
)

message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        result = executor(
            tool_call.function.name,
            json.loads(tool_call.function.arguments)
        )
        print(f"Tool: {tool_call.function.name}")
        print(f"Result: {result}")