Automate browsing with natural-language instructions using Stagehand
Use Steel with Stagehand for natural-language-driven AI browser automation.
Stagehand replaces brittle selectors with two LLM-backed primitives:
stagehand.extract(instruction, schema): describe what you want, pass a Zod schema, get typed data back.stagehand.act(instruction): describe an action in natural language, Stagehand figures out the click / type / scroll.
Both run against a Steel session over CDP, so Stagehand handles the reasoning and Steel handles the browser (stealth, proxies, live viewer).
stagehand = new Stagehand({env: "LOCAL",localBrowserLaunchOptions: {cdpUrl: `${session.websocketUrl}&apiKey=${STEEL_API_KEY}`,},model: { modelName: "openai/gpt-5", apiKey: OPENAI_API_KEY },});await stagehand.init();
env: "LOCAL" tells Stagehand "I'll hand you the browser." That browser is Steel, reached via the CDP URL. model is the LLM that interprets every instruction. This starter targets Stagehand v3.
Typed extraction, an instruction paired with a Zod schema:
const stories = await stagehand.extract("extract the titles and ranks of the first 5 stories on the page",z.object({stories: z.array(z.object({ title: z.string(), rank: z.number() })),}),);
The schema isn't just documentation. Stagehand constrains the LLM's output against it and gives you a typed result at runtime. Swap the prompt and schema for any extraction problem: forms, tables, search results, prices.
Natural-language action, no selector required:
await stagehand.act("click the 'new' link in the top navigation");
Stagehand inspects the DOM, picks the matching element, and clicks it.
Run it
cd examples/stagehand-tscp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYnpm installnpm start
Get keys from app.steel.dev and platform.openai.com. A session viewer URL prints as the script starts. Open it in another tab to watch Stagehand work.
Your output varies. Structure looks like this:
Creating Steel session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34…Initializing Stagehand...Connected to browser via StagehandNavigating to Hacker News...Extracting top stories using AI...Top 5 Hacker News Stories:1. Claude 4.7 Opus released today2. Show HN: A browser extension for reading on slow connections3. …Navigating to HN's 'new' section via a natural-language click...Navigated to new stories!Automation completed successfully!
A full run takes ~30 seconds and costs a few cents of Steel session time plus OpenAI tokens for each extract / act call.
Make it yours
- Swap the schema and prompt.
extract()works on any data shape: forms, invoices, product grids, tables. Change thestagehand.extractcall inindex.tsto whatever you need to read off a page. - Chain acts and extracts. Break a task into natural-language steps: "sign in with these creds, then extract invoices from the past month." Each step is one
act()orextract(). - Try another model.
gpt-5works well out of the box; Claude and Gemini also work. SwapmodelNameandapiKeyin theStagehandconfig.
Related
Stagehand v3 ships two LLM-backed primitives that replace CSS selectors with natural language:
sessions.extract(instruction, schema): describe what you want, pass a JSON schema, get structured data back.sessions.act(instruction): describe an action, Stagehand decides whether to click, type, or scroll.
Both run inside an embedded local Stagehand server that drives a Steel-hosted Chrome over CDP. You get the reasoning (Stagehand), the browser (Steel, with stealth, proxies, live viewer), and the Python async story tying them together.
stagehand = AsyncStagehand(server="local",model_api_key=OPENAI_API_KEY,local_ready_timeout_s=30.0,)stagehand_session = await stagehand.sessions.start(model_name="openai/gpt-5",browser={"type": "local","launchOptions": {"cdpUrl": f"{session.websocket_url}&apiKey={STEEL_API_KEY}",},},)session_id = stagehand_session.data.session_id
server="local" boots an embedded Stagehand server in-process. browser.type="local" with a cdpUrl tells that server to attach to a browser you already have, which is the Steel session. model_name is the LLM that interprets every instruction.
The Python SDK is async-first. Every extract, act, and navigate call returns a coroutine, and this starter uses asyncio.run(main()) as the entry point. There is no sync wrapper; plan to await each step.
The v3 streaming shape
Unlike the TypeScript SDK, where stagehand.extract(...) resolves directly to data, the Python v3 SDK exposes extract and act as SSE streams. You iterate events, watch for finished or error, and pick up the payload from the terminating event. The starter wraps that pattern in _stream_to_result:
async def _stream_to_result(stream, label):result_payload = Noneasync for event in stream:if event.type == "log":print(f"[{label}][log] {event.data.message}")continuestatus = event.data.statusif status == "finished":result_payload = event.data.resultelif status == "error":raise RuntimeError(f"{label} stream: {event.data.error or 'unknown'}")return result_payload
The log events are Stagehand's internal reasoning trace (which element it picked, why, how confident). Print them during development, silence them in production.
Typed extraction
sessions.extract takes a JSON schema dict and returns data that conforms to it. No Zod, no pydantic required. A plain Python dict is enough:
STORY_SCHEMA = {"type": "object","properties": {"stories": {"type": "array","items": {"type": "object","properties": {"title": {"type": "string"},"rank": {"type": "integer"},},"required": ["title", "rank"],},}},"required": ["stories"],}extract_stream = stagehand.sessions.extract(id=session_id,instruction="Extract the titles and ranks of the first 5 stories on the page",schema=STORY_SCHEMA,stream_response=True,x_stream_response="true",)stories = await _stream_to_result(extract_stream, "extract")
Stagehand constrains the LLM's output against the schema, so stories["stories"] is a list of dicts with the keys you asked for. If you'd rather work with dataclasses or pydantic models, pass the JSON schema here and validate on your side once the stream resolves.
Natural-language action
sessions.act takes an instruction and no selector. Stagehand inspects the DOM, picks the matching element, and acts:
act_stream = stagehand.sessions.act(id=session_id,instruction="click the 'new' link in the top navigation",stream_response=True,x_stream_response="true",)await _stream_to_result(act_stream, "act")
Phrase instructions the way you'd describe the action to a person: "click the 'Add to Cart' button", "type user@example.com in the email field", "scroll to the pricing table". Stagehand handles the mapping.
Run it
cd examples/stagehand-pycp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run main.py
Get keys from app.steel.dev and platform.openai.com. The script prints a session viewer URL as it starts. Open it in another tab to watch Stagehand work.
Your output varies. Structure looks like this:
Creating Steel session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34…Initializing Stagehand...Connected to browser via StagehandNavigating to Hacker News...Extracting top stories using AI...Top 5 Hacker News Stories:1. Claude 4.7 Opus released today2. Show HN: A browser extension for reading on slow connections3. …Navigating to HN's 'new' section via a natural-language click...Navigated to new stories!Automation completed successfully!
A full run takes ~30 seconds and costs a few cents of Steel session time plus OpenAI tokens for each extract / act call. The finally block in main() calls stagehand.sessions.end, stagehand.close(), and client.sessions.release(). Keep all three. Forgetting the last one keeps the Steel browser running until the default 5-minute timeout.
Make it yours
- Swap the schema and prompt.
STORY_SCHEMAand thesessions.extractinstruction inmain.pyare the only parts tied to the Hacker News demo. Replace them with whatever shape you need to read: invoice lines, product grids, table rows. - Chain acts and extracts. Break a task into natural-language steps, one
await _stream_to_result(...)per step. Login, navigate, filter, extract. - Try another model.
openai/gpt-5is a reasonable default; Claude and Gemini also work. Changemodel_nameonsessions.startand pointmodel_api_keyat the matching provider. - Turn on Steel stealth. Uncomment
use_proxy,solve_captcha, orsession_timeoutin theclient.sessions.create()call for sites with anti-bot.
Related
Related recipes
Automate a cloud browser with Playwright
Use Steel with Playwright in TypeScript for cloud browser automation.
Automate a cloud browser with Puppeteer
Use Steel with Puppeteer in TypeScript for cloud browser automation.
Automate a cloud browser with Selenium
Use Steel with Selenium in Python for cloud browser automation.