Build a browser agent with the Claude Agent SDK
Use Steel with the Claude Agent SDK (TypeScript) to build a tool-using browser agent on Anthropic's first-party agent loop.
@anthropic-ai/claude-agent-sdk is the engine behind the Claude Code CLI, exposed as a Node library. The package bundles a native Claude Code binary as an optional dependency, so npm install is the entire setup. You get the CLI's agent loop, hooks, subagents, MCP support, and built-in tool catalog (Read, Edit, Bash, Grep, ...) without spawning the CLI yourself.
This recipe disables those built-ins and attaches a Steel cloud browser instead. Four MCP tools (openSession, navigate, snapshot, extract) sit in front of Playwright; the agent calls them by name and streams back typed messages.
Tools with Zod-typed inputs
Each tool() call pairs a Zod schema with an async handler. The args parameter is inferred straight from the schema, so the handler is typed end to end:
const navigate = tool("navigate","Navigate the open session to a URL and wait for it to load.",{ url: z.string().describe("Absolute URL to navigate to") },async ({ url }) => {await page.goto(url, { waitUntil: "domcontentloaded", timeout: 45_000 });return {content: [{ type: "text", text: JSON.stringify({ url: page.url(), title: await page.title() }) },],};},);
.describe() writes a per-field hint that Claude reads when deciding which tool to pick. For optional parameters add .default(). Zod records the default in the schema and the SDK forwards it through to JSON Schema.
Returning { content: [...] } matches MCP's CallToolResult type. Setting isError: true on the return keeps the loop alive after a handler-level failure: Claude sees the failure as data and adapts, instead of the whole query() call throwing.
Tools combine into a single in-process server:
const steelServer = createSdkMcpServer({name: "steel",version: "1.0.0",tools: [openSession, navigate, snapshot, extract],});
"In-process" is literal: no stdio bridge, no child process. The MCP server lives inside your Node process and dispatches calls in microseconds.
Driving the agent loop
query() returns a Query, which is an async generator over SDKMessage plus a few extra controls (interrupt(), setPermissionMode(), setModel()). The options block wires Steel in and locks the agent down:
for await (const message of query({prompt: PROMPT,options: {model: "claude-sonnet-4-6",systemPrompt: SYSTEM_PROMPT,mcpServers: { steel: steelServer },allowedTools: ["mcp__steel__*"],tools: [],settingSources: [],maxTurns: 20,permissionMode: "bypassPermissions",},})) {...}
Tool names follow mcp__{server}__{tool}, where the server segment matches the mcpServers key. The wildcard mcp__steel__* pre-approves every Steel tool without per-call prompts. tools: [] drops the entire Claude Code built-in catalog: no filesystem reads, no Bash, no WebFetch. The agent only sees what you wrote. settingSources: [] skips loading .claude/ from your working directory or home, so the recipe behaves the same on every machine.
permissionMode: "bypassPermissions" is the unattended-script setting. Combined with tools: [] and the explicit allow list, there is nothing risky to bypass.
Reading typed messages
The generator yields a discriminated union. Narrow with message.type:
for await (const message of query({...})) {if (message.type === "assistant") {for (const block of message.message.content) {if (block.type === "tool_use") {const name = block.name.replace(/^mcp__steel__/, "");console.log(` -> ${name}(${JSON.stringify(block.input).slice(0, 120)})`);}}} else if (message.type === "result") {if (message.subtype === "success") finalText = message.result ?? "";}}
assistant messages carry the model's content blocks (text, tool_use, thinking). result arrives once at the end with the final answer plus total_cost_usd, usage, and duration_ms.
The Agent SDK does not return a typed final object the way @openai/agents does with outputType. If you need structured output, request JSON in the prompt and parse message.result, or run a short follow-up query() to reformat the answer.
Run it
cd examples/claude-agent-sdk-tscp .env.example .env # set STEEL_API_KEY and ANTHROPIC_API_KEYnpm installnpx playwright install chromiumnpm start
Get keys at app.steel.dev/settings/api-keys and console.anthropic.com. A Steel session viewer URL prints when openSession runs; open it in another tab to watch the browser live.
Your output varies. Structure looks like this:
Steel + Claude Agent SDK (TypeScript) Starter============================================================Sure, let me open a browser session and pull that page.-> open_session({})open_session: 1747ms-> navigate({"url":"https://github.com/trending/python?since=daily"})navigate: 2007ms-> snapshot({})snapshot: 272ms (4000 chars, 49 links)I have everything I need. Top three trending repos ...--- Final answer ---Top 3 AI/ML-related repos:1. owner/repo - description (X stars)...Releasing Steel session...Session released. Replay: https://app.steel.dev/sessions/ab12cd34...
A run takes ~25 to 45 seconds and 3 to 6 turns. Cost is Steel session-minutes plus Anthropic tokens; the snapshot's text dominates each turn's prompt.
The finally block calls steel.sessions.release(). Without it the cloud browser idles until the default timeout while you keep paying.
Make it yours
- Swap the task. Change
PROMPTand (optionally)SYSTEM_PROMPT. The four tools are task-agnostic; any page with visible text and repeating rows fits. - Reach for Opus 4.7. Set
model: "claude-opus-4-7"for harder reasoning. The bundled CLI auto-usesANTHROPIC_API_KEY. - Add a tool. Define another
tool(), append it to thetoolsarray increateSdkMcpServer. Aclick(selector)tool that callspage.clickis the most common fifth one. - Hook the lifecycle. Pass a
hooksoption with callbacks forPreToolUse,PostToolUse,Stop,SessionStartto audit, log, or block individual tool calls. - Resume sessions. Capture
session_idfrom the firstsystem/initmessage, passresume: sessionIdon the nextquery()call to keep agent memory across runs. - Persist a login. Pair with credentials or auth-context so Steel sessions start already authenticated.
Related
Anthropic Agent SDK docs · Python version · Claude Computer Use (TypeScript) for the raw screenshot loop
The Claude Agent SDK is the agent loop that powers Claude Code, packaged as a library. You hand query() a prompt and a set of options; it streams typed messages back. Tools are async functions decorated with @tool, bundled into an in-process MCP server with create_sdk_mcp_server, and registered through the mcp_servers option. No subprocess, no separate MCP host.
This recipe wires four browser tools (open_session, navigate, snapshot, extract) into one Steel session and points the agent at GitHub Trending. The agent picks tools, the SDK runs them, and the final ResultMessage carries the model's natural-language answer.
Tools as an in-process MCP server
Each tool is a thin wrapper around Playwright. @tool takes a name, a description, and an input schema:
@tool("navigate","Navigate the open session to a URL and wait for the page to load.",{"url": str},)async def navigate(args: dict[str, Any]) -> dict[str, Any]:await _page.goto(args["url"], wait_until="domcontentloaded", timeout=45_000)return {"content": [{"type": "text", "text": json.dumps({"url": _page.url, "title": await _page.title()})}]}
The {"url": str} shape is sugar: the SDK converts it into a JSON Schema with one required url parameter. For tools whose inputs need lists or nested objects (like extract), pass full JSON Schema instead. The Python SDK accepts both formats from the same decorator.
Returns must follow the MCP CallToolResult shape: {"content": [...]} with one or more text/image/resource blocks. JSON-encoding the tool output keeps every result self-describing for the model on the next turn.
Once defined, every tool goes into a single MCP server:
steel_server = create_sdk_mcp_server(name="steel",version="1.0.0",tools=[open_session, navigate, snapshot, extract],)
"In-process" is literal: no stdio bridge, no spawn, no separate MCP server binary. The server lives inside your Python process and dispatches calls in microseconds.
Wiring the tools into query
ClaudeAgentOptions glues everything together:
options = ClaudeAgentOptions(model="claude-sonnet-4-6",system_prompt=SYSTEM_PROMPT,mcp_servers={"steel": steel_server},allowed_tools=["mcp__steel__*"],tools=[],setting_sources=[],max_turns=20,permission_mode="bypassPermissions",)
Three options matter for keeping the agent on-task and the recipe reproducible:
mcp_servers={"steel": steel_server}. The dict key becomes the server segment in fully qualified tool names. Each tool surfaces to Claude asmcp__steel__open_session,mcp__steel__navigate, and so on. The wildcardmcp__steel__*inallowed_toolspre-approves all four without per-call prompts.tools=[]. Drops the SDK's built-ins. By default the agent inheritsRead,Write,Edit,Bash,Grep,WebFetch, and friends. None of those make sense for a focused browser agent, andBashwould let it shell out on your machine. The empty list removes them from Claude's context entirely.setting_sources=[]. The SDK normally loads.claude/from your working directory and~/.claude/. The empty list disables that, so the recipe runs identically in CI, in a colleague's checkout, and on your laptop.
permission_mode="bypassPermissions" is safe here because the only callable tools are the four you wrote. With tools=[], there is nothing else to bypass.
Reading the message stream
query() returns an async iterator over typed messages. Narrow with isinstance:
async for message in query(prompt=PROMPT, options=options):if isinstance(message, AssistantMessage):for block in message.content:if isinstance(block, ToolUseBlock):name = block.name.removeprefix("mcp__steel__")print(f" -> {name}({json.dumps(block.input)[:120]})")elif isinstance(block, TextBlock):...elif isinstance(message, ResultMessage):if message.subtype == "success":final_text = message.result or ""
AssistantMessage carries the model's content blocks: TextBlock for prose, ToolUseBlock for the tool name and arguments. ResultMessage arrives once at the end and holds the final answer along with token usage and cost in its other fields.
There is no Runner.run returning a Pydantic object the way OpenAI's Agents SDK does with output_type. If you want structured output, ask for it in the prompt (model returns JSON) and parse message.result yourself, or run a second short query() to reformat the answer.
Run it
cd examples/claude-agent-sdk-pycp .env.example .env # set STEEL_API_KEY and ANTHROPIC_API_KEYuv run playwright install chromiumuv run main.py
Get keys from app.steel.dev and console.anthropic.com. The Python SDK ships with the Claude Code CLI bundled, so a single uv sync (run by uv run) is enough.
Your output varies. Structure looks like this:
Steel + Claude Agent SDK (Python) Starter============================================================Sure, let me open a browser session and pull that page.-> open_session({})open_session: 1840ms-> navigate({"url": "https://github.com/trending/python?since=daily"})navigate: 2484ms-> snapshot({})snapshot: 487ms (4000 chars, 49 links)I have everything I need. Here are the top 3 ...--- Final answer ---Top 3 AI/ML-related Python repos on today's trending list:1. owner/repo - <description> (X stars)...Releasing Steel session...Session released. Replay: https://app.steel.dev/sessions/ab12cd34...
A run takes ~30 to 50 seconds and 3 to 6 turns. Cost is Steel session-minutes plus Anthropic tokens for claude-sonnet-4-6; the snapshot's text dominates the prompt size on each turn.
The finally block closes Playwright and calls steel.sessions.release(). Skipping it leaves the browser running until the default timeout while you keep paying.
Make it yours
- Swap the task. Change
PROMPTand, if useful,SYSTEM_PROMPT. The four tools are task-agnostic; any page that yields visible text plus repeating rows fits the same shape. - Use Opus 4.7 for harder pages. Set
model="claude-opus-4-7"inClaudeAgentOptions. Sonnet 4.6 is the cost/speed default. - Add a tool. Decorate a new async function with
@tool, append it to thetoolslist passed tocreate_sdk_mcp_server. Aclick(selector)tool that callspage.clickis a useful fifth one for forms and pagination. - Hook the lifecycle. Pass
hooks={"PostToolUse": [...]}onClaudeAgentOptionsto log every tool call, validate arguments, or veto destructive actions. The hook events match Claude Code's:PreToolUse,PostToolUse,Stop,SessionStart. - Resume sessions. Capture
SystemMessage.data["session_id"]from the first run, passresume=session_idon the nextClaudeAgentOptionsto continue with full context. Agent memory is its own thing; the Steel browser session is a separate object with a separate ID. - Hand off auth. Pair with credentials or auth-context so the Steel session starts already logged in.
Related
Anthropic Agent SDK docs · TypeScript version · Claude Computer Use (Python) for the raw screenshot loop
Related recipes
Build a typed browser agent with Pydantic AI
Use Steel with Pydantic AI to build typed, provider-agnostic browser agents with dependency injection.
Build a typed browser agent with LangGraph
Use Steel with LangGraph to build a typed browser agent with an explicit state-machine loop and a structured-output formatter node.
Build a typed browser agent with Mastra
Use Steel with Mastra to build a typed browser agent with the Mastra Model Router and Studio playground.