Quickstart (TypeScript)

Build a browser agent with the OpenAI Agents SDK for TypeScript and Steel. The agent opens a Steel session, navigates and snapshots the page, optionally extracts structured rows, and returns a Zod-validated final report.

Scroll to the bottom for the full example.

Requirements

  • Steel API key

  • OpenAI API key

  • Node.js 20+

Step 1: Project Setup

Terminal
mkdir steel-openai-agents && \
cd steel-openai-agents && \
npm init -y && \
npm install -D typescript @types/node ts-node && \
npx tsc --init && \
npm pkg set scripts.start="ts-node index.ts" && \
touch index.ts .env

Step 2: Install Dependencies

Terminal
$
npm install @openai/agents steel-sdk playwright zod dotenv

Step 3: Environment Variables

ENV
.env
1
STEEL_API_KEY=your-steel-api-key-here
2
OPENAI_API_KEY=your-openai-api-key-here

Step 4: Define Steel tools

Each tool is a typed tool() with a Zod parameters schema. Browser state (the Steel session + Playwright page) lives in a closure so every tool call sees the same page.

Two Zod gotchas with OpenAI strict mode

The Agents SDK sends tool schemas in strict JSON Schema mode. Two things get rejected that otherwise look fine:

  • Use .nullable(), not .optional(). Every property must be in required. z.string().optional() marks the field not-required and is rejected; z.string().nullable() keeps it required but lets the model pass null.
  • Skip .url() on tool params. Zod emits "format": "uri" for .url(), and strict mode rejects uri (supported formats are date-time, date, time, duration, email, hostname, ipv4, ipv6, uuid). Use plain z.string() and validate inside execute if needed.
Typescript
index.ts
1
import * as dotenv from "dotenv";
2
import { Agent, run, tool } from "@openai/agents";
3
import { chromium, type Browser, type Page } from "playwright";
4
import Steel from "steel-sdk";
5
import { z } from "zod";
6
7
dotenv.config();
8
9
const STEEL_API_KEY = process.env.STEEL_API_KEY!;
10
const steel = new Steel({ steelAPIKey: STEEL_API_KEY });
11
12
let session: Awaited<ReturnType<typeof steel.sessions.create>> | null = null;
13
let browser: Browser | null = null;
14
let page: Page | null = null;
15
16
const openSession = tool({
17
name: "open_session",
18
description:
19
"Open a Steel cloud browser session. Call exactly once, before anything else.",
20
parameters: z.object({}),
21
execute: async () => {
22
session = await steel.sessions.create({});
23
browser = await chromium.connectOverCDP(
24
`${session.websocketUrl}&apiKey=${STEEL_API_KEY}`
25
);
26
const ctx = browser.contexts()[0];
27
page = ctx.pages()[0] ?? (await ctx.newPage());
28
return { sessionId: session.id, liveViewUrl: session.sessionViewerUrl };
29
},
30
});
31
32
const navigate = tool({
33
name: "navigate",
34
description: "Navigate the open session to a URL and wait for it to load.",
35
// OpenAI strict JSON Schema rejects "uri" format, so use plain z.string() here.
36
parameters: z.object({ url: z.string() }),
37
execute: async ({ url }) => {
38
if (!page) throw new Error("open_session first.");
39
await page.goto(url, { waitUntil: "domcontentloaded", timeout: 45_000 });
40
return { url: page.url(), title: await page.title() };
41
},
42
});
43
44
const snapshot = tool({
45
name: "snapshot",
46
description:
47
"Return a readable snapshot of the current page: title, URL, visible text (capped), and a list of links. Call BEFORE extract so you never have to guess CSS selectors.",
48
parameters: z.object({
49
maxChars: z.number().int().positive().max(10_000).default(4_000),
50
maxLinks: z.number().int().positive().max(200).default(50),
51
}),
52
execute: async ({ maxChars, maxLinks }) => {
53
if (!page) throw new Error("open_session first.");
54
return (await page.evaluate(
55
({ maxChars, maxLinks }: { maxChars: number; maxLinks: number }) => {
56
const text = (document.body.innerText || "").slice(0, maxChars);
57
const links = Array.from(document.querySelectorAll("a[href]"))
58
.slice(0, maxLinks)
59
.map((a) => {
60
const anchor = a as HTMLAnchorElement;
61
const t = (anchor.innerText || anchor.textContent || "").trim().slice(0, 120);
62
return { text: t, href: anchor.href };
63
})
64
.filter((l) => l.text && l.href);
65
return { url: location.href, title: document.title, text, links };
66
},
67
{ maxChars, maxLinks }
68
)) as { url: string; title: string; text: string; links: { text: string; href: string }[] };
69
},
70
});
71
72
const extract = tool({
73
name: "extract",
74
description:
75
"Extract structured rows from the current page using CSS selectors. Prefer calling snapshot() first.",
76
parameters: z.object({
77
rowSelector: z.string(),
78
fields: z.array(z.object({
79
name: z.string(),
80
selector: z.string(),
81
attr: z.string().nullable(),
82
})).min(1).max(10),
83
limit: z.number().int().positive().max(20).default(10),
84
}),
85
execute: async ({ rowSelector, fields, limit }) => {
86
if (!page) throw new Error("open_session first.");
87
const items = (await page.evaluate(
88
({ rowSelector, fields, limit }: {
89
rowSelector: string;
90
fields: { name: string; selector: string; attr: string | null }[];
91
limit: number;
92
}) => {
93
const rows = Array.from(document.querySelectorAll(rowSelector)).slice(0, limit);
94
return rows.map((row) => {
95
const item: Record<string, string> = {};
96
for (const f of fields) {
97
const el = f.selector ? (row.querySelector(f.selector) as Element | null) : row;
98
if (!el) { item[f.name] = ""; continue; }
99
item[f.name] = f.attr
100
? (el.getAttribute(f.attr) ?? "").trim()
101
: (((el as HTMLElement).innerText ?? el.textContent ?? "")).trim();
102
}
103
return item;
104
});
105
},
106
{ rowSelector, fields, limit }
107
)) as Record<string, string>[];
108
return { count: items.length, items };
109
},
110
});

Step 5: Build the Agent

Give the agent instructions, tools, a model, and an outputType (Zod schema) for the final answer. Unlike some providers that force JSON-only mode when you ask for structured output, OpenAI supports outputType + tools together — the agent uses tools freely and still returns a validated final answer.

Typescript
index.ts
1
const FinalReport = z.object({
2
summary: z.string().describe("One-paragraph summary of what these repos have in common."),
3
repos: z.array(z.object({
4
name: z.string(),
5
url: z.string(),
6
stars: z.string().nullable(),
7
description: z.string().nullable(),
8
})).min(1).max(5),
9
});
10
11
const agent = new Agent({
12
name: "SteelResearch",
13
instructions: [
14
"You operate a Steel cloud browser via tools.",
15
"Workflow: (1) open_session, (2) navigate to the target URL,",
16
"(3) snapshot to see the page's text and links,",
17
"(4) only call extract when you need structured rows beyond snapshot,",
18
"(5) return the final FinalReport.",
19
"Prefer snapshot's links list over guessing selectors. Do not invent data.",
20
].join(" "),
21
model: "gpt-5-mini",
22
tools: [openSession, navigate, snapshot, extract],
23
outputType: FinalReport,
24
});

Step 6: Run and clean up

Typescript
index.ts
1
async function main() {
2
try {
3
const result = await run(
4
agent,
5
"Go to https://github.com/trending/python?since=daily and return the top 3 AI/ML-related repositories. For each, give name (owner/repo), GitHub URL, star count as shown, and the repo description.",
6
{ maxTurns: 15 }
7
);
8
console.log(JSON.stringify(result.finalOutput, null, 2));
9
} finally {
10
if (browser) await browser.close().catch(() => {});
11
if (session) await steel.sessions.release(session.id).catch(() => {});
12
}
13
}
14
15
main().catch((e) => { console.error(e); process.exit(1); });

Run It

Terminal
npm start

Swap the model

gpt-5-mini is the default here because it's fast enough for interactive iteration. Swap up to gpt-5 when you need higher-quality reasoning on harder pages — expect 15-40s per turn because of its reasoning stage.

const agent = new Agent({ /* ... */, model: "gpt-5" }); // slower, better reasoning

Next Steps