Build a typed browser agent with Pydantic AI
Use Steel with Pydantic AI to build typed, provider-agnostic browser agents with dependency injection.
Pydantic AI is the Pydantic team's agent framework. It's provider-agnostic and reuses the Pydantic models you'd already validate API I/O with for tool arguments and final outputs, so an agent slots into a typed Python codebase without a parallel schema layer.
This starter wires a Steel browser into Pydantic AI's dependency-injection pattern and points the agent at GitHub Trending.
from pydantic_ai import Agent, RunContextagent = Agent("openai:gpt-5-mini",deps_type=BrowserDeps,output_type=FinalReport,tools=[navigate, snapshot, extract],instructions="You operate a Steel cloud browser via tools. ...",)result = await agent.run("Go to https://github.com/trending/python ...",deps=BrowserDeps(page=page),)final: FinalReport = result.output
agent.run runs the model loop until the agent returns a FinalReport (or an exception unwinds it). result.output is typed because output_type=FinalReport ties the final turn to the schema. Validation failures are fed back to the model so it corrects itself.
Deps over globals
The interesting bit is deps_type=BrowserDeps. Pydantic AI takes a single dependencies object per run and passes it to every tool through RunContext.deps. That replaces the module-level globals you'd otherwise reach for when several tools share a runtime resource:
@dataclassclass BrowserDeps:page: Pageasync def navigate(ctx: RunContext[BrowserDeps], url: str) -> dict:"""Navigate the open session to a URL and wait for the page to load."""await ctx.deps.page.goto(url, wait_until="domcontentloaded", timeout=45_000)return {"url": ctx.deps.page.url, "title": await ctx.deps.page.title()}
Tools are plain async functions that take RunContext[BrowserDeps] first. The example registers them via tools=[...] on the Agent, but @agent.tool works the same way once the agent exists. Setup and teardown live in main, not in tools. Open the Steel session, connect Playwright, grab a page, then hand it to agent.run(deps=BrowserDeps(page=page)). When the run finishes (or raises), you close the browser and call steel.sessions.release() in a finally. Two consequences worth knowing:
- 1Concurrent runs work without changes. Each
agent.runcall gets its owndeps, so you can fan out N agents over N Steel sessions and they don't trample each other. - 2Tools stay testable. Pass a fake
BrowserDepswith a stub page and call the tool function directly.
The three tools
navigate is a thin wrapper around page.goto. snapshot returns the page's title, URL, visible text (capped at 4k chars), and the first 50 links. The docstring instructs the agent to call it before extract:
"""Return a readable snapshot of the current page: title, URL, visibletext (capped), and a list of links. Call BEFORE extract so the agentnever has to guess CSS selectors."""
This matters. With only navigate plus extract, the model invents selectors like .trending-repo that don't exist on real pages, calls extract, gets zero rows, retries. snapshot hands it the real DOM signals (visible text, href list) so it picks a selector that actually matches.
extract runs one page.evaluate to pull N rows with M fields each. The inline comment explains why:
# Serial CDP round-trips to Steel's cloud browser are ~200-300ms each,# so N*M round-trips burns seconds. One evaluate call is <500ms total.
Each field is a FieldSpec Pydantic model (name, selector, optional attr). Pydantic AI reads the type hint, generates the JSON schema, and validates the model's arguments before calling your function. Non-conforming arguments never reach Python.
The typed output
class Repo(BaseModel):name: strurl: strstars: Optional[str] = Nonedescription: Optional[str] = Noneclass FinalReport(BaseModel):summary: strrepos: list[Repo] = Field(min_length=1, max_length=5)
output_type=FinalReport pins the last turn to the schema and validates with Pydantic. On a ValidationError, Pydantic AI feeds the error back to the model and retries (up to a configurable cap). result.output in main is a FinalReport, not a string you have to parse, and your IDE knows it.
The same FinalReport model is the one you'd validate an HTTP payload with. Pydantic AI doesn't introduce a parallel schema system; if you're already using Pydantic for I/O boundaries, your agent outputs share the type stack.
Run it
cd examples/pydantic-aicp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run playwright install chromiumuv run main.py
Get keys from app.steel.dev and platform.openai.com. Each tool call prints its latency so you can see where time is going.
Your output varies. Structure looks like this:
Steel + Pydantic AI Starter============================================================Session: https://app.steel.dev/sessions/ab12cd34...navigate: 1612mssnapshot: 487ms (3821 chars, 48 links)extract: 394ms (3 rows)Agent finished.{"summary": "Three trending Python repos focused on agentic workflows...","repos": [{"name": "owner/repo","url": "https://github.com/owner/repo","stars": "1,240","description": "..."},...]}Releasing Steel session...Session released. Replay: https://app.steel.dev/sessions/ab12cd34...
A run takes ~20 to 40 seconds and 5 to 10 agent turns on GitHub Trending. Cost is a few cents of Steel session time plus OpenAI tokens. The finally block in main closes Playwright and calls steel.sessions.release() so Steel stops billing per-minute.
Make it yours
- Swap the model. Change the first arg to
agent.'anthropic:claude-sonnet-4-6'and'google-gla:gemini-2.5-flash'work without code changes; tool-arg JSON schemas are provider-agnostic. Set the matching API key in.env. - Swap the task. Change the prompt in
agent.runand theFinalReportschema. Tools stay the same; the agent re-plans against the new shape. - Add a tool. Write an async function that takes
RunContext[BrowserDeps], add it totools=[...](or use@agent.toolafter the agent exists). A useful fourth tool isclick(selector: str)that callspage.clickand waits for navigation. - Stream the answer. Use
async with agent.run_stream(prompt, deps=...)to stream the final answer token-by-token while tool calls happen behind the scenes. Helpful for long summaries. - Run agents in parallel. Construct a session+page per task and
asyncio.gather(agent.run(...))over them. Each run sees its owndeps; nothing is shared by accident. - Watch with Logfire. Pydantic AI integrates with Logfire for traces of every turn, tool call, and token count. Drop in a few lines of setup and every
agent.runis observable.
Related
Steel + OpenAI Agents SDK (Python) for a sibling typed-agent recipe with handoffs and tracing. Pydantic AI documentation for tools, output validators, retries, and Logfire integration.
Related recipes
Build a typed browser agent with LangGraph
Use Steel with LangGraph to build a typed browser agent with an explicit state-machine loop and a structured-output formatter node.
Build a typed browser agent with Mastra
Use Steel with Mastra to build a typed browser agent with the Mastra Model Router and Studio playground.
Build a typed browser agent with the OpenAI Agents SDK
Use Steel with the OpenAI Agents SDK for TypeScript to build typed, tool-using browser agents.