Build a typed browser agent with the OpenAI Agents SDK
Use Steel with the OpenAI Agents SDK for TypeScript to build typed, tool-using browser agents.
Scaffolds a starter project locally. Requires the Steel CLI.
The OpenAI Agents SDK (@openai/agents) is a small runtime for agentic loops. You define an Agent with instructions, a model, tools, and an outputType. You call run(agent, prompt). The SDK handles "pick a tool, call it, feed the result back, repeat" and validates the final message against your schema.
This recipe turns the tool layer into a Steel cloud browser. Four tool() wrappers in index.ts (openSession, navigate, snapshot, extract) shuttle CDP calls between the agent and Playwright. Demo task: scan github.com/trending/python and return the top 3 AI/ML repos as a validated FinalReport.
const FinalReport = z.object({summary: z.string(),repos: z.array(z.object({name: z.string(),url: z.string(),stars: z.string().nullable(),description: z.string().nullable(),})).min(1).max(5),});const agent = new Agent({name: "SteelResearch",instructions: "You operate a Steel cloud browser via tools. Workflow: ...",model: "gpt-5-mini",tools: [openSession, navigate, snapshot, extract],outputType: FinalReport,});const result = await run(agent, "Go to https://github.com/trending/python ...", { maxTurns: 15 });console.log(result.finalOutput); // typed as z.infer<typeof FinalReport>
The SDK compiles Zod to OpenAI's strict JSON Schema at registration, which tightens a couple of rules: no .url() format (pass a plain z.string()), and .optional() is rejected. Use .nullable() instead.
Run it
cd examples/openai-agents-tscp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYnpm installnpx playwright install chromiumnpm start
Get keys at app.steel.dev/settings/api-keys and platform.openai.com/api-keys. A viewer URL prints as openSession runs.
Your output varies. Structure looks like this:
Steel + OpenAI Agents SDK (TypeScript) Starter============================================================open_session: 1432msnavigate: 2180mssnapshot: 412ms (3921 chars, 48 links)extract: 380ms (3 rows)Agent finished.{"summary": "All three repos focus on LLM tooling written in Python...","repos": [{ "name": "owner/repo", "url": "https://github.com/...", "stars": "1,204", "description": "..." },...]}Releasing Steel session...Session released. Replay: https://app.steel.dev/sessions/ab12cd34...
A full run is ~20-40 seconds. Cost is a few cents of Steel session time plus OpenAI tokens per turn. The finally block calls steel.sessions.release().
Make it yours
- Swap the task and schema. Change the prompt passed to
run()and rewriteFinalReport. The four tools are task-agnostic. - Add handoffs. Pass
handoffs: [writerAgent]on theAgent. The SDK routes between agents based on each one's description. - Add a guardrail. Wire
inputGuardrailsoroutputGuardrailson theAgentto vet the user's prompt or the final message. See the guardrails guide. - Use a stronger model.
model: "gpt-5"plans better on ambiguous pages at the cost of tokens and latency. - Turn on stealth. Pass
useProxy,solveCaptcha, or a longersessionTimeouttosessions.create()for sites with anti-bot.
Related
Python version · OpenAI Agents SDK docs · Computer Use version
Scaffolds a starter project locally. Requires the Steel CLI.
The OpenAI Agents SDK runs the tool-call loop so you don't have to. You declare an Agent with tools, a model, and (optionally) a Pydantic output_type. You call Runner.run(agent, input=...) once. The SDK handles every model turn, every tool dispatch, and every schema check until the agent returns a typed final answer.
This starter wraps a Steel browser as four tools and points the agent at GitHub Trending.
from agents import Agent, Runner, function_toolagent = Agent(name="SteelResearch",instructions="You operate a Steel cloud browser via tools. ...",model="gpt-5-mini",tools=[open_session, navigate, snapshot, extract],output_type=FinalReport,)result = await Runner.run(agent, input="...", max_turns=15)final: FinalReport = result.final_output
Each tool is a plain async function wrapped with @function_tool. The SDK reads the signature and docstring to build the JSON schema the model sees. output_type=FinalReport forces the last turn to produce a Pydantic-validated object, so result.final_output is typed.
Run it
cd examples/openai-agents-pycp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run playwright install chromiumuv run main.py
Get keys from app.steel.dev and platform.openai.com. Each tool call prints its latency so you can see where time is going.
Your output varies. Structure looks like this:
Steel + OpenAI Agents SDK (Python) Starter============================================================open_session: 2843msnavigate: 1612mssnapshot: 487ms (3821 chars, 48 links)extract: 394ms (3 rows)Agent finished.{"summary": "Three trending Python repos focused on agentic workflows...","repos": [{"name": "owner/repo","url": "https://github.com/owner/repo","stars": "1,240","description": "..."},...]}Releasing Steel session...Session released. Replay: https://app.steel.dev/sessions/ab12cd34...
A run takes ~20 to 40 seconds and 5 to 10 agent turns on GitHub Trending. Cost is a few cents of Steel session time plus OpenAI tokens. The finally block in main closes Playwright and calls steel.sessions.release().
The Agents SDK ships tracing on by default. Each Runner.run produces a trace viewable at platform.openai.com/traces.
Make it yours
- Swap the task. Change the
input=string inmain()and theFinalReportschema. Tools stay the same; the agent re-plans. - Add a tool. Write an async function, decorate with
@function_tool, add it totools=[...]. A useful fifth tool isclick(selector: str)that callspage.clickand waits for navigation. - Hand off to a specialist. The SDK supports handoffs: define a second
Agent(say, aSummarizerwith no tools) and list it inhandoffs=[...]on the research agent. - Add a guardrail. Attach an input or output guardrail to reject off-topic requests or validate the
FinalReportbefore it returns. - Swap the model.
model="gpt-5"for harder reasoning,"gpt-5-mini"(default) for speed and cost. - Raise
max_turns. 15 is plenty for single-page extraction. Multi-page flows want 25 to 40. - Use
context. Replace module globals with a dataclass passed toRunner.run(agent, input=..., context=my_ctx). Each tool reads it viaRunContextWrapper. Needed for concurrent runs.
Related
TypeScript version · OpenAI Computer Use (Python) · OpenAI Agents SDK docs
Related recipes
Build a typed browser agent with Pydantic AI
Use Steel with Pydantic AI to build typed, provider-agnostic browser agents with dependency injection.
Build a typed browser agent with LangGraph
Use Steel with LangGraph to build a typed browser agent with an explicit state-machine loop and a structured-output formatter node.
Build a typed browser agent with Mastra
Use Steel with Mastra to build a typed browser agent with the Mastra Model Router and Studio playground.