Build an AI browser agent with Magnitude
Use Steel with Magnitude for AI-powered browser automation.
Magnitude grew out of end-to-end testing and kept the bias: an agent loop that narrates each turn, a CDP-level browser hookup, and LLM-backed primitives designed to intermix navigation, action, and typed readback. startBrowserAgent() hands you a BrowserAgent with a small surface this recipe exercises:
agent.extract(instruction, schema): describe what to pull off the page, pass a Zod schema, get a typed result.agent.act(instruction): describe an interaction in natural language. The agent plans, clicks, types, retries.agent.stop(): flush and tear down. Pair withclient.sessions.release()in afinally.
Steel supplies the browser over CDP, so Magnitude never launches its own Chromium:
const agent = await startBrowserAgent({url: "https://github.com/steel-dev/leaderboard",narrate: true,telemetry: false,llm: {provider: "anthropic",options: {model: "claude-sonnet-4-6",apiKey: ANTHROPIC_API_KEY,},},browser: {cdp: `${session.websocketUrl}&apiKey=${STEEL_API_KEY}`,},});
browser.cdp is the whole wiring. Magnitude attaches to the context Steel hands back instead of spawning a local browser. narrate: true streams a log of what the agent is doing between screenshot turns, which is the setting worth leaving on while you tune prompts. The url option does the first navigation for you, so there is no separate goto call in main().
What the demo does
main() in index.ts walks a three-step flow against Steel's public leaderboard repo:
- 1Extract the user behind the most recent commit, against a small Zod schema:
const mostRecentCommitter = await agent.extract("Find the user with the most recent commit",z.object({user: z.string(),commit: z.string(),}),);
The schema is the contract. Magnitude constrains the model's output against it, and the returned value is typed at the call site. Swap the shape for any read problem: forms, invoices, tables, search results.
- 1Act to open the pull request that produced that commit. No selector, no URL; the agent reads the page, finds the link, clicks it:
await agent.act("Find the pull request behind the most recent commit if there is one",);
- 1Extract a prose summary of what the PR changed. Same
extractshape, different schema.
The act call sits in try / catch because the leaderboard head commit is not always tied to a merged PR. When it is not, Magnitude throws and the script logs "No pull request found or accessible" and moves on. Worth copying that pattern for any step that depends on page state you cannot guarantee.
Run it
cd examples/magnitudecp .env.example .env # set STEEL_API_KEY and ANTHROPIC_API_KEYnpm installnpm start
Steel keys live at app.steel.dev/settings/api-keys; Anthropic keys at console.anthropic.com. The script prints a session viewer URL on boot. Open it in another tab to watch Magnitude drive the page.
Your output varies. Structure looks like this:
Steel + Magnitude Node Starter============================================================Creating Steel session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34...Connected to browser via MagnitudeLooking for commits[narrate] taking screenshot of github.com/steel-dev/leaderboard[narrate] extracting: Find the user with the most recent commitMost recent committer:alice-dev has the most recent commitLooking for pull request behind the most recent commit[narrate] clicking commit SHA link[narrate] navigating to pull/482Found pull request!Adds a tie-breaker rule when two contributors have identical scores.Automation completed successfully!Stopping Magnitude agent...Releasing Steel session...Steel session released successfully
A full run takes ~45 seconds; most of that is Claude picking actions. You pay for a minute or two of Steel session time plus a handful of Anthropic tokens per act / extract call. Each agent turn consumes a screenshot, so the bill grows with the number of steps, not the length of the page.
The finally block stops the agent first, then releases the session. Reverse that order and Magnitude can try to screenshot a browser Steel already tore down.
Make it yours
- Swap the schema and prompt.
extract()is schema-driven: forms, tables, invoices, search results. Change the Zod shape and the instruction; everything else holds. - Chain
actcalls for multi-step flows. Login, filter, paginate, export. Each step is one natural-language instruction; errors are catchable per step, like the PR lookup here. - Switch models.
llm.provideraccepts"anthropic"(used here) among others. PointmodelandapiKeyat a different provider instartBrowserAgent()and the rest of the script stays put. - Turn on stealth. Uncomment
useProxy,solveCaptcha, orsessionTimeoutinclient.sessions.create()for sites with anti-bot.
Related
Related recipes
Build a browser agent with the Claude Agent SDK
Use Steel with the Claude Agent SDK (TypeScript) to build a tool-using browser agent on Anthropic's first-party agent loop.
Build a typed browser agent with Pydantic AI
Use Steel with Pydantic AI to build typed, provider-agnostic browser agents with dependency injection.
Build a typed browser agent with LangGraph
Use Steel with LangGraph to build a typed browser agent with an explicit state-machine loop and a structured-output formatter node.