Custom agent endpoints
When your AppKit app needs more than a foundation model response or a Genie-style data query, you call a custom agent: an LLM shaped by instructions, tools, document grounding, or multi-agent orchestration. On Databricks, custom agents deploy as Model Serving endpoints, so the Model Serving plugin calls them like any foundation model.
Prerequisites
- Databricks CLI
v0.296+with an authenticated profile. - A running AppKit app. See Apps quickstart.
- A deployed agent endpoint.
Three ways to get an endpoint
Three Databricks products produce agent endpoints. The table summarizes when to use each; subsections below link to the setup docs.
| Builder | Use when | Setup |
|---|---|---|
| Knowledge Assistant | You need Q&A over documents (PDFs, Markdown, Office files) with citations | Click-through UI in the workspace |
| Agent Bricks Multi-Agent Supervisor | You need to coordinate existing Genie spaces, other agents, Unity Catalog functions, or MCP servers | Click-through UI in the workspace |
| Custom Python agent | No builder fits; you need arbitrary orchestration, custom tools, or a proprietary framework | Write Python with ResponsesAgent, deploy via agents.deploy() |
Knowledge Assistant
Turns a folder of documents (plain text, PDFs, Markdown, Office files in a Unity Catalog volume) or a vector search index into a Q&A chatbot with source citations. Good for product docs, HR policies, support knowledge bases. Databricks builds and deploys the agent endpoint for you.
See Knowledge Assistant.
Agent Bricks Multi-Agent Supervisor
Coordinates subagents (Genie spaces, other agent endpoints, Unity Catalog functions, MCP servers) to complete a task, handling delegation and result synthesis. Good for workflows that span domains, for example searching research reports and querying usage data in the same conversation. Like Knowledge Assistant, the builder produces a single agent endpoint.
See Agent Bricks Multi-Agent Supervisor and the Supervisor API for response shapes and query parameters.
Custom Python agent
Author an agent in Python when neither builder covers your use case. The Databricks path is the ResponsesAgent interface plus a framework of your choice (OpenAI Agents SDK, LangGraph, LlamaIndex), with MLflow handling tracing. Agents deploy as Model Serving endpoints (via agents.deploy()) or as full Databricks Apps. The App-based path produces a standalone app, not an endpoint you'd call from another AppKit app.
See Create an AI agent on docs.databricks.com. Authoring is out of scope for this page.
Wire it up
The Model Serving plugin calls agent endpoints the same way it calls foundation model endpoints. Point the plugin at your agent's env var:
serving({
endpoints: {
assistant: { env: "DATABRICKS_AGENT_ENDPOINT" },
},
}),
Bind the env var to a serving-endpoint resource in app.yaml:
env:
- name: DATABRICKS_AGENT_ENDPOINT
valueFrom: serving-endpoint
When you add the agent endpoint as an app resource (Databricks Apps UI or CLI), Databricks grants your app's service principal CAN QUERY on the endpoint.
For the full wiring pattern, including createApp, useServingStream, and custom route handlers, see Call a governed endpoint from AppKit.
What the response looks like
If streaming, responses arrive as useServingStream chunks; if non-streaming, useServingInvoke returns the complete object. The request shape is typically OpenAI Chat Completions-compatible (messages, max_tokens, optional stream). Endpoints built on ResponsesAgent use the OpenAI Responses API (input instead of messages). The response shape depends on the builder. Rather than guess, look it up in Playground:
- Open your agent endpoint in the workspace and click Open in Playground.
- Click Get code and pick Curl API or Python API.
- Run the example and inspect the response to see the exact fields.
Broad patterns to expect:
- Knowledge Assistant: text answers with source citations. The endpoint returns document references alongside the answer, ready to render as citations for verifiability. See Knowledge Assistant.
- Agent Bricks Multi-Agent Supervisor: a synthesized answer drawn from whatever subagents the supervisor routed to (Genie spaces, Knowledge Assistants, Unity Catalog functions, MCP servers). The MLflow trace captures the full sequence of model calls and tool executions.
- Custom Python agent: whatever the author designed. Agents built on the
ResponsesAgentinterface use the OpenAI Responses API (inputinstead ofmessages).
Per-user permissions
Serving routes in AppKit run on behalf of the authenticated user by default. If the agent hits user-scoped data (for example a Supervisor Agent that routes to a Genie space the user can query), the user only sees the data they're allowed to see. No extra auth code.
For server logic outside the built-in plugin routes (for example, custom Express routes), call AppKit.serving("assistant").asUser(req).invoke(...) to keep per-user behavior. For background work without a request (scheduled tasks, workers), omit asUser and the call runs as the app's service principal.
Where to next
Try the AI Chat App for a complete AppKit and agent setup, or browse the templates catalog for more patterns.