Automate browsing with natural-language instructions using Stagehand
Use Steel with Stagehand for natural-language-driven AI browser automation.
Scaffolds a starter project locally. Requires the Steel CLI.
Stagehand replaces brittle selectors with two LLM-backed primitives:
stagehand.extract(instruction, schema): describe what you want, pass a Zod schema, get typed data back.stagehand.act(instruction): describe an action in natural language, Stagehand figures out the click / type / scroll.
Both run against a Steel session over CDP, so Stagehand handles the reasoning and Steel handles the browser (stealth, proxies, live viewer).
stagehand = new Stagehand({env: "LOCAL",localBrowserLaunchOptions: {cdpUrl: `${session.websocketUrl}&apiKey=${STEEL_API_KEY}`,},model: { modelName: "openai/gpt-5", apiKey: OPENAI_API_KEY },});await stagehand.init();
env: "LOCAL" tells Stagehand "I'll hand you the browser." That browser is Steel, reached via the CDP URL. model is the LLM that interprets every instruction. This starter targets Stagehand v3.
Typed extraction, an instruction paired with a Zod schema:
const stories = await stagehand.extract("extract the titles and ranks of the first 5 stories on the page",z.object({stories: z.array(z.object({ title: z.string(), rank: z.number() })),}),);
The schema isn't just documentation. Stagehand constrains the LLM's output against it and gives you a typed result at runtime. Swap the prompt and schema for any extraction problem: forms, tables, search results, prices.
Natural-language action, no selector required:
await stagehand.act("click the 'new' link in the top navigation");
Stagehand inspects the DOM, picks the matching element, and clicks it.
Run it
cd examples/stagehand-tscp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYnpm installnpm start
Get keys from app.steel.dev and platform.openai.com. A session viewer URL prints as the script starts. Open it in another tab to watch Stagehand work.
Your output varies. Structure looks like this:
Creating Steel session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34…Initializing Stagehand...Connected to browser via StagehandNavigating to Hacker News...Extracting top stories using AI...Top 5 Hacker News Stories:1. Claude 4.7 Opus released today2. Show HN: A browser extension for reading on slow connections3. …Navigating to HN's 'new' section via a natural-language click...Navigated to new stories!Automation completed successfully!
A full run takes ~30 seconds and costs a few cents of Steel session time plus OpenAI tokens for each extract / act call.
Make it yours
- Swap the schema and prompt.
extract()works on any data shape: forms, invoices, product grids, tables. Change thestagehand.extractcall inindex.tsto whatever you need to read off a page. - Chain acts and extracts. Break a task into natural-language steps: "sign in with these creds, then extract invoices from the past month." Each step is one
act()orextract(). - Try another model.
gpt-5works well out of the box; Claude and Gemini also work. SwapmodelNameandapiKeyin theStagehandconfig.
Related
Scaffolds a starter project locally. Requires the Steel CLI.
Stagehand v3 ships two LLM-backed primitives that replace CSS selectors with natural language:
sessions.extract(instruction, schema): describe what you want, pass a JSON schema, get structured data back.sessions.act(instruction): describe an action, Stagehand decides whether to click, type, or scroll.
Both run inside an embedded local Stagehand server that drives a Steel-hosted Chrome over CDP.
stagehand = AsyncStagehand(server="local",model_api_key=OPENAI_API_KEY,local_ready_timeout_s=30.0,)stagehand_session = await stagehand.sessions.start(model_name="openai/gpt-5",browser={"type": "local","launchOptions": {"cdpUrl": f"{session.websocket_url}&apiKey={STEEL_API_KEY}",},},)session_id = stagehand_session.data.session_id
The Python SDK is async-first. Every extract, act, and navigate call returns a coroutine, and this starter uses asyncio.run(main()) as the entry point.
Unlike the TypeScript SDK, the Python v3 SDK exposes extract and act as SSE streams. The starter wraps that pattern in _stream_to_result:
async def _stream_to_result(stream, label):result_payload = Noneasync for event in stream:if event.type == "log":print(f"[{label}][log] {event.data.message}")continuestatus = event.data.statusif status == "finished":result_payload = event.data.resultelif status == "error":raise RuntimeError(f"{label} stream: {event.data.error or 'unknown'}")return result_payload
sessions.extract takes a JSON schema dict and returns data that conforms to it. No Zod, no pydantic required:
STORY_SCHEMA = {"type": "object","properties": {"stories": {"type": "array","items": {"type": "object","properties": {"title": {"type": "string"},"rank": {"type": "integer"},},"required": ["title", "rank"],},}},"required": ["stories"],}extract_stream = stagehand.sessions.extract(id=session_id,instruction="Extract the titles and ranks of the first 5 stories on the page",schema=STORY_SCHEMA,stream_response=True,x_stream_response="true",)stories = await _stream_to_result(extract_stream, "extract")
sessions.act takes an instruction and no selector:
act_stream = stagehand.sessions.act(id=session_id,instruction="click the 'new' link in the top navigation",stream_response=True,x_stream_response="true",)await _stream_to_result(act_stream, "act")
Run it
cd examples/stagehand-pycp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run main.py
Get keys from app.steel.dev and platform.openai.com. The script prints a session viewer URL as it starts.
Your output varies. Structure looks like this:
Creating Steel session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34…Initializing Stagehand...Connected to browser via StagehandNavigating to Hacker News...Extracting top stories using AI...Top 5 Hacker News Stories:1. Claude 4.7 Opus released today2. Show HN: A browser extension for reading on slow connections3. …Navigating to HN's 'new' section via a natural-language click...Navigated to new stories!Automation completed successfully!
A full run takes ~30 seconds. The finally block in main() calls stagehand.sessions.end, stagehand.close(), and client.sessions.release(). Keep all three.
Make it yours
- Swap the schema and prompt.
STORY_SCHEMAand thesessions.extractinstruction inmain.pyare the only parts tied to the Hacker News demo. - Chain acts and extracts. Break a task into natural-language steps, one
await _stream_to_result(...)per step. - Try another model.
openai/gpt-5is a reasonable default; Claude and Gemini also work. Changemodel_nameonsessions.startand pointmodel_api_keyat the matching provider. - Turn on Steel stealth. Uncomment
use_proxy,solve_captcha, orsession_timeoutin theclient.sessions.create()call for sites with anti-bot.
Related
Related recipes
Automate a cloud browser with Playwright
Use Steel with Playwright in TypeScript for cloud browser automation.
Automate a cloud browser with Puppeteer
Use Steel with Puppeteer in TypeScript for cloud browser automation.
Automate a cloud browser with Selenium
Use Steel with Selenium in Python for cloud browser automation.