Quickstart
Build a typed, tool-using browser agent with Steel and the Vercel AI SDK v6 ToolLoopAgent. The agent opens a Steel session, navigates and extracts, and ends with a typed final tool whose input is the structured result.
Scroll to the bottom to see a full example!
Requirements
-
Steel API key
-
Anthropic API key
-
Node.js 20+
Step 1: Project Setup
Create a new TypeScript project and basic script:
mkdir steel-ai-sdk && \cd steel-ai-sdk && \npm init -y && \npm install -D typescript @types/node ts-node && \npx tsc --init && \npm pkg set scripts.start="ts-node index.ts" && \touch index.ts .env
Step 2: Install Dependencies
$npm install ai @ai-sdk/anthropic steel-sdk playwright zod dotenv
Step 3: Environment Variables
Create a .env file with your API keys:
1STEEL_API_KEY=your-steel-api-key-here2ANTHROPIC_API_KEY=your-anthropic-api-key-here
Step 4: Define Steel tools
Each tool is a typed tool() with a Zod input schema. Browser state (the Steel session + Playwright page) lives in a closure so every tool call sees the same page.
1import * as dotenv from "dotenv";2import Steel from "steel-sdk";3import { anthropic } from "@ai-sdk/anthropic";4import { ToolLoopAgent, tool, stepCountIs, hasToolCall } from "ai";5import { chromium, type Browser, type Page } from "playwright";6import { z } from "zod";78dotenv.config();910const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";11const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || "your-anthropic-api-key-here";1213const steel = new Steel({ steelAPIKey: STEEL_API_KEY });1415let session: Awaited<ReturnType<typeof steel.sessions.create>> | null = null;16let browser: Browser | null = null;17let page: Page | null = null;1819const openSession = tool({20description:21"Open a Steel cloud browser session. Call this exactly once, before anything else.",22inputSchema: z.object({}),23execute: async () => {24session = await steel.sessions.create({});25browser = await chromium.connectOverCDP(26`${session.websocketUrl}&apiKey=${STEEL_API_KEY}`27);28const ctx = browser.contexts()[0];29page = ctx.pages()[0] ?? (await ctx.newPage());30return { sessionId: session.id, liveViewUrl: session.sessionViewerUrl };31},32});3334const navigate = tool({35description:36"Navigate the open session to a URL and wait for the page to load.",37inputSchema: z.object({ url: z.string().url() }),38execute: async ({ url }) => {39if (!page) throw new Error("openSession must be called first.");40await page.goto(url, { waitUntil: "domcontentloaded", timeout: 45_000 });41return { url: page.url(), title: await page.title() };42},43});4445const snapshot = tool({46description:47"Return a readable snapshot of the current page: title, URL, visible text (capped), and a list of links with their text and href. Call this BEFORE extract so you never have to guess CSS selectors.",48inputSchema: z.object({49maxChars: z.number().int().positive().max(10_000).default(4_000),50maxLinks: z.number().int().positive().max(200).default(50),51}),52execute: async ({ maxChars, maxLinks }) => {53if (!page) throw new Error("openSession must be called first.");54return (await page.evaluate(55({ maxChars, maxLinks }: { maxChars: number; maxLinks: number }) => {56const text = (document.body.innerText || "").slice(0, maxChars);57const links = Array.from(document.querySelectorAll("a[href]"))58.slice(0, maxLinks)59.map((a) => {60const anchor = a as HTMLAnchorElement;61const t = (anchor.innerText || anchor.textContent || "").trim().slice(0, 120);62return { text: t, href: anchor.href };63})64.filter((l) => l.text && l.href);65return { url: location.href, title: document.title, text, links };66},67{ maxChars, maxLinks }68)) as { url: string; title: string; text: string; links: { text: string; href: string }[] };69},70});7172const extract = tool({73description:74"Extract structured data from the current page using CSS selectors. Provide one row selector plus a list of per-row field selectors.",75inputSchema: z.object({76rowSelector: z77.string()78.describe("CSS selector matching each item. e.g. 'article.Box-row'"),79fields: z.array(z.object({80name: z.string(),81selector: z82.string()83.describe(84"CSS selector relative to the row. Use an empty string to read the row element itself."85),86attr: z87.string()88.optional()89.describe("Optional attribute to read instead of innerText, e.g. 'href'."),90})).min(1).max(10),91limit: z.number().int().positive().max(20).default(10),92}),93execute: async ({ rowSelector, fields, limit }) => {94if (!page) throw new Error("openSession must be called first.");95// Run the whole extraction inside one page.evaluate so we pay the96// CDP round-trip once, not N*M times. Serial CDP calls (row.$,97// el.getAttribute, el.innerText) are the single biggest source of98// slowness on a cloud browser.99const items = (await page.evaluate(100({ rowSelector, fields, limit }: {101rowSelector: string;102fields: { name: string; selector: string; attr?: string }[];103limit: number;104}) => {105const rows = Array.from(106document.querySelectorAll(rowSelector)107).slice(0, limit);108return rows.map((row) => {109const item: Record<string, string> = {};110for (const f of fields) {111const el = f.selector112? (row.querySelector(f.selector) as Element | null)113: row;114if (!el) { item[f.name] = ""; continue; }115if (f.attr) {116item[f.name] = (el.getAttribute(f.attr) ?? "").trim();117} else {118const text = (el as HTMLElement).innerText ?? el.textContent ?? "";119item[f.name] = text.trim();120}121}122return item;123});124},125{ rowSelector, fields, limit }126)) as Record<string, string>[];127return { count: items.length, items };128},129});
The obvious implementation — page.$$(rowSelector) then await row.$(f.selector) and await el.innerText() per field — looks fine locally but each of those awaits is a separate CDP round-trip to Steel's cloud browser (~200-300ms each). A 10×4 extract becomes 40 round-trips (8-12 seconds). The page.evaluate version above is one round-trip: <500ms.
Step 5: Build the ToolLoopAgent
The agent's last move is a reportFindings tool with a Zod-typed input and no execute. In v6, a tool without an execute stops the loop as soon as it's called — so this tool doubles as the structured-output carrier. The typed final result is the tool call's input.
type: tip
Why not output: Output.object(...)? On Anthropic, forcing a JSON response format disables tool calling — the provider warns "JSON response format does not support tools. The provided tools are ignored." The "final tool" pattern is the v6-idiomatic way to combine tool loops with typed output.
1const reportFindings = tool({2description:3"Call this LAST with your final findings. Calling this ends the research.",4inputSchema: z.object({5summary: z6.string()7.describe("One-paragraph summary of what these repos have in common."),8repos: z.array(z.object({9name: z.string(),10url: z.string(),11stars: z.string().optional(),12description: z.string().optional(),13})).min(1).max(5),14}),15// intentionally no execute: lacking execute makes v6 stop the loop16});1718const researchAgent = new ToolLoopAgent({19model: anthropic("claude-haiku-4-5"),20instructions: [21"You operate a Steel cloud browser via tools.",22"Workflow: (1) call openSession, (2) navigate to the target URL,",23"(3) call snapshot to see the page's text and links,",24"(4) only call extract when you need structured rows beyond what snapshot gives you,",25"(5) call reportFindings once with your final result.",26"Do not invent data. Prefer snapshot's links list over guessing selectors.",27].join(" "),28stopWhen: [stepCountIs(15), hasToolCall("reportFindings")],29tools: { openSession, navigate, snapshot, extract, reportFindings },30onStepFinish: async ({ stepNumber, toolCalls, usage }) => {31const names = toolCalls?.map((t: any) => t.toolName).join(", ") || "(text only)";32console.log(` step ${stepNumber}: ${names} | ${usage?.totalTokens ?? 0} tokens`);33},34});
Without it, the agent has to guess CSS selectors. Wrong guess → empty extract → retry → another model round-trip. snapshot returns the page's visible text + link list in one page.evaluate (<500ms), so the agent can decide whether extract is even necessary. For link-heavy sites (trending pages, news indexes, search results) the findings are already in the links list, and the agent skips extract entirely — saving a step.
Step 6: Run the agent and clean up
The agent opens the Steel session itself during its first step. The final typed result is the reportFindings tool call's input, found in result.steps.
1async function main() {2try {3const result = await researchAgent.generate({4prompt:5"Go to https://github.com/trending/python?since=daily and return the top 3 AI/ML-related repositories. For each, give its full name (owner/repo), GitHub URL, star count as shown on the page, and the repo description.",6});78const steps = (result as any).steps ?? [];9const reportCall = steps10.flatMap((s: any) => s.toolCalls ?? [])11.find((tc: any) => tc.toolName === "reportFindings");12const structured = reportCall?.input ?? { text: result.text };1314console.log(JSON.stringify(structured, null, 2));15} finally {16if (browser) await browser.close().catch(() => {});17if (session) await steel.sessions.release(session.id).catch(() => {});18}19}2021main().catch((e) => {22console.error(e);23process.exit(1);24});
Run It
npm start
You'll see a live session viewer URL in the console — open it to watch the agent drive the browser in real time.
Phase-gate tools with prepareStep (optional)
prepareStep runs before each step and can narrow the tool set per phase — preventing the agent from calling openSession twice, or from extracting before navigating.
prepareStep: async ({ stepNumber, steps }) => {if (stepNumber === 0) return { activeTools: ["openSession"] };return { activeTools: ["navigate", "extract"] };},
Swap the model
The default is Claude Haiku 4.5 — fast and cheap, which matters because the agent round-trips through the model 3-5 times per run. Swap up when the task needs stronger reasoning:
import { openai } from "@ai-sdk/openai";import { google } from "@ai-sdk/google";// model: anthropic("claude-sonnet-4-6"), // smarter, slower// model: openai("gpt-5"),// model: google("gemini-2.5-pro"),
Or use the AI Gateway string form (e.g. "anthropic/claude-haiku-4-5") to route through Vercel.
Full Example
Complete index.ts you can paste and run:
1/*2* Build an AI browser agent with Vercel AI SDK v6 (ToolLoopAgent) and Steel.3* https://github.com/steel-dev/steel-cookbook/tree/main/examples/steel-ai-sdk-starter4*/56import * as dotenv from "dotenv";7import Steel from "steel-sdk";8import { anthropic } from "@ai-sdk/anthropic";9import { ToolLoopAgent, tool, stepCountIs, hasToolCall } from "ai";10import { chromium, type Browser, type Page } from "playwright";11import { z } from "zod";1213dotenv.config();1415const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";16const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY || "your-anthropic-api-key-here";1718const steel = new Steel({ steelAPIKey: STEEL_API_KEY });1920let session: Awaited<ReturnType<typeof steel.sessions.create>> | null = null;21let browser: Browser | null = null;22let page: Page | null = null;2324const openSession = tool({25description:26"Open a Steel cloud browser session. Call this exactly once, before anything else.",27inputSchema: z.object({}),28execute: async () => {29session = await steel.sessions.create({});30browser = await chromium.connectOverCDP(31`${session.websocketUrl}&apiKey=${STEEL_API_KEY}`32);33const ctx = browser.contexts()[0];34page = ctx.pages()[0] ?? (await ctx.newPage());35return { sessionId: session.id, liveViewUrl: session.sessionViewerUrl };36},37});3839const navigate = tool({40description:41"Navigate the open session to a URL and wait for the page to load.",42inputSchema: z.object({ url: z.string().url() }),43execute: async ({ url }) => {44if (!page) throw new Error("openSession must be called first.");45await page.goto(url, { waitUntil: "domcontentloaded", timeout: 45_000 });46return { url: page.url(), title: await page.title() };47},48});4950const snapshot = tool({51description:52"Return a readable snapshot of the current page: title, URL, visible text (capped), and a list of links with their text and href. Call this BEFORE extract so you never have to guess CSS selectors.",53inputSchema: z.object({54maxChars: z.number().int().positive().max(10_000).default(4_000),55maxLinks: z.number().int().positive().max(200).default(50),56}),57execute: async ({ maxChars, maxLinks }) => {58if (!page) throw new Error("openSession must be called first.");59return (await page.evaluate(60({ maxChars, maxLinks }: { maxChars: number; maxLinks: number }) => {61const text = (document.body.innerText || "").slice(0, maxChars);62const links = Array.from(document.querySelectorAll("a[href]"))63.slice(0, maxLinks)64.map((a) => {65const anchor = a as HTMLAnchorElement;66const t = (anchor.innerText || anchor.textContent || "").trim().slice(0, 120);67return { text: t, href: anchor.href };68})69.filter((l) => l.text && l.href);70return { url: location.href, title: document.title, text, links };71},72{ maxChars, maxLinks }73)) as { url: string; title: string; text: string; links: { text: string; href: string }[] };74},75});7677const extract = tool({78description:79"Extract structured data from the current page using CSS selectors. Provide one row selector plus a list of per-row field selectors.",80inputSchema: z.object({81rowSelector: z82.string()83.describe("CSS selector matching each item. e.g. 'article.Box-row'"),84fields: z.array(z.object({85name: z.string(),86selector: z87.string()88.describe(89"CSS selector relative to the row. Use an empty string to read the row element itself."90),91attr: z92.string()93.optional()94.describe("Optional attribute to read instead of innerText, e.g. 'href'."),95})).min(1).max(10),96limit: z.number().int().positive().max(20).default(10),97}),98execute: async ({ rowSelector, fields, limit }) => {99if (!page) throw new Error("openSession must be called first.");100const items = (await page.evaluate(101({ rowSelector, fields, limit }: {102rowSelector: string;103fields: { name: string; selector: string; attr?: string }[];104limit: number;105}) => {106const rows = Array.from(107document.querySelectorAll(rowSelector)108).slice(0, limit);109return rows.map((row) => {110const item: Record<string, string> = {};111for (const f of fields) {112const el = f.selector113? (row.querySelector(f.selector) as Element | null)114: row;115if (!el) { item[f.name] = ""; continue; }116if (f.attr) {117item[f.name] = (el.getAttribute(f.attr) ?? "").trim();118} else {119const text = (el as HTMLElement).innerText ?? el.textContent ?? "";120item[f.name] = text.trim();121}122}123return item;124});125},126{ rowSelector, fields, limit }127)) as Record<string, string>[];128return { count: items.length, items };129},130});131132const reportFindings = tool({133description:134"Call this LAST with your final findings. Calling this ends the research.",135inputSchema: z.object({136summary: z137.string()138.describe("One-paragraph summary of what these repos have in common."),139repos: z.array(z.object({140name: z.string(),141url: z.string(),142stars: z.string().optional(),143description: z.string().optional(),144})).min(1).max(5),145}),146// intentionally no execute: lacking execute makes v6 stop the loop147});148149const researchAgent = new ToolLoopAgent({150model: anthropic("claude-haiku-4-5"),151instructions: [152"You operate a Steel cloud browser via tools.",153"Workflow: (1) call openSession, (2) navigate to the target URL,",154"(3) call snapshot to see the page's text and links,",155"(4) only call extract when you need structured rows beyond what snapshot gives you,",156"(5) call reportFindings once with your final result.",157"Do not invent data. Prefer snapshot's links list over guessing selectors.",158].join(" "),159stopWhen: [stepCountIs(15), hasToolCall("reportFindings")],160tools: { openSession, navigate, snapshot, extract, reportFindings },161onStepFinish: async ({ stepNumber, toolCalls, usage }) => {162const names = toolCalls?.map((t: any) => t.toolName).join(", ") || "(text only)";163console.log(` step ${stepNumber}: ${names} | ${usage?.totalTokens ?? 0} tokens`);164},165});166167async function main() {168try {169const result = await researchAgent.generate({170prompt:171"Go to https://github.com/trending/python?since=daily and return the top 3 AI/ML-related repositories. For each, give its full name (owner/repo), GitHub URL, star count as shown on the page, and the repo description.",172});173174const steps = (result as any).steps ?? [];175const reportCall = steps176.flatMap((s: any) => s.toolCalls ?? [])177.find((tc: any) => tc.toolName === "reportFindings");178const structured = reportCall?.input ?? { text: result.text };179180console.log(JSON.stringify(structured, null, 2));181} finally {182if (browser) await browser.close().catch(() => {});183if (session) await steel.sessions.release(session.id).catch(() => {});184}185}186187main().catch((e) => {188console.error(e);189process.exit(1);190});
Next Steps
-
Vercel AI SDK — Agents: https://ai-sdk.dev/docs/agents/overview
-
ToolLoopAgent reference: https://ai-sdk.dev/docs/agents/building-agents
-
Loop control (
stopWhen,prepareStep): https://ai-sdk.dev/docs/agents/loop-control -
Steel Sessions API: /overview/sessions-api/overview
-
Steel Node SDK: https://github.com/steel-dev/steel-node
-
This Example on GitHub: https://github.com/steel-dev/steel-cookbook/tree/main/examples/steel-ai-sdk-starter