Quickstart (Python)
Build a browser agent with the OpenAI Agents SDK for Python and Steel. The agent opens a Steel session, navigates and snapshots the page, optionally extracts structured rows, and returns a Pydantic-validated final report.
Scroll to the bottom for the full example.
Requirements
-
Steel API key
-
OpenAI API key
-
Python 3.11+
Step 1: Project Setup
mkdir steel-openai-agents-py && \cd steel-openai-agents-py && \python -m venv .venv && \source .venv/bin/activate && \touch main.py .env
Step 2: Install Dependencies
$uv venv$source .venv/bin/activate$uv add openai-agents steel-sdk playwright pydantic python-dotenv
playwright install chromium
Step 3: Environment Variables
1STEEL_API_KEY=your-steel-api-key-here2OPENAI_API_KEY=your-openai-api-key-here
Step 4: Define Steel tools
Each tool is an async function decorated with @function_tool. The SDK reads the signature and docstring to build the JSON schema automatically. Pydantic models are used where an argument needs structure.
1import asyncio2import os3from typing import Optional45from agents import Agent, Runner, function_tool6from dotenv import load_dotenv7from playwright.async_api import Browser, Page, async_playwright8from pydantic import BaseModel, Field9from steel import Steel1011load_dotenv()1213STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"14steel = Steel(steel_api_key=STEEL_API_KEY)1516_session = None17_browser: Optional[Browser] = None18_page: Optional[Page] = None19_playwright = None202122@function_tool23async def open_session() -> dict:24"""Open a Steel cloud browser session. Call exactly once, before anything else."""25global _session, _browser, _page, _playwright26_session = steel.sessions.create()27_playwright = await async_playwright().start()28_browser = await _playwright.chromium.connect_over_cdp(29f"{_session.websocket_url}&apiKey={STEEL_API_KEY}"30)31ctx = _browser.contexts[0]32_page = ctx.pages[0] if ctx.pages else await ctx.new_page()33return {"session_id": _session.id, "live_view_url": _session.session_viewer_url}343536@function_tool37async def navigate(url: str) -> dict:38"""Navigate the open session to a URL and wait for the page to load."""39if _page is None:40raise RuntimeError("open_session first.")41await _page.goto(url, wait_until="domcontentloaded", timeout=45_000)42return {"url": _page.url, "title": await _page.title()}434445@function_tool46async def snapshot(max_chars: int = 4_000, max_links: int = 50) -> dict:47"""Return a readable snapshot of the current page: title, URL, visible48text (capped), and a list of links. Call BEFORE extract so the agent49never has to guess CSS selectors.50"""51if _page is None:52raise RuntimeError("open_session first.")53return await _page.evaluate(54"""({maxChars, maxLinks}) => {55const text = (document.body.innerText || '').slice(0, maxChars);56const links = Array.from(document.querySelectorAll('a[href]'))57.slice(0, maxLinks)58.map((a) => ({59text: (a.innerText || a.textContent || '').trim().slice(0, 120),60href: a.href,61}))62.filter((l) => l.text && l.href);63return { url: location.href, title: document.title, text, links };64}""",65{"maxChars": max_chars, "maxLinks": max_links},66)676869class FieldSpec(BaseModel):70name: str71selector: str = Field(72description="CSS selector relative to the row. Empty string reads the row itself."73)74attr: Optional[str] = Field(75default=None,76description="Optional attribute to read instead of innerText (e.g. 'href').",77)787980@function_tool81async def extract(82row_selector: str, fields: list[FieldSpec], limit: int = 1083) -> dict:84"""Extract structured rows from the current page using CSS selectors.85Prefer calling snapshot() first to confirm the page structure.86"""87if _page is None:88raise RuntimeError("open_session first.")89fields_json = [{"name": f.name, "selector": f.selector, "attr": f.attr} for f in fields]90items = await _page.evaluate(91"""({rowSelector, fields, limit}) => {92const rows = Array.from(93document.querySelectorAll(rowSelector)94).slice(0, limit);95return rows.map((row) => {96const item = {};97for (const f of fields) {98const el = f.selector ? row.querySelector(f.selector) : row;99if (!el) { item[f.name] = ''; continue; }100item[f.name] = f.attr101? (el.getAttribute(f.attr) || '').trim()102: (el.innerText || el.textContent || '').trim();103}104return item;105});106}""",107{"rowSelector": row_selector, "fields": fields_json, "limit": limit},108)109return {"count": len(items), "items": items}
page.query_selector_all + row.query_selector + el.inner_text() look fine locally but each await is a separate CDP round-trip to Steel's cloud browser (~200-300ms each). A 10×4 extract becomes 40 round-trips (8-12 seconds). The page.evaluate version above runs the whole extraction in the browser: one round-trip, <500ms.
Step 5: Build the Agent
Define a Pydantic output_type to get a typed final answer. OpenAI supports output_type + tools together, unlike some providers that force JSON-only mode when you ask for structured output.
1class Repo(BaseModel):2name: str3url: str4stars: Optional[str] = None5description: Optional[str] = None678class FinalReport(BaseModel):9summary: str = Field(10description="One-paragraph summary of what these repos have in common."11)12repos: list[Repo] = Field(min_length=1, max_length=5)131415agent = Agent(16name="SteelResearch",17instructions=(18"You operate a Steel cloud browser via tools. "19"Workflow: (1) open_session, (2) navigate to the target URL, "20"(3) snapshot to see the page's text and links, "21"(4) only call extract when you need structured rows beyond snapshot, "22"(5) return the final FinalReport. "23"Prefer snapshot's links list over guessing selectors. Do not invent data."24),25model="gpt-5-mini",26tools=[open_session, navigate, snapshot, extract],27output_type=FinalReport,28)
Step 6: Run and clean up
1async def main() -> None:2try:3result = await Runner.run(4agent,5input=(6"Go to https://github.com/trending/python?since=daily and return the "7"top 3 AI/ML-related repositories. For each, give name (owner/repo), "8"GitHub URL, star count as shown, and the repo description."9),10max_turns=15,11)12final: FinalReport = result.final_output13print(final.model_dump_json(indent=2))14finally:15if _browser is not None:16await _browser.close()17if _playwright is not None:18await _playwright.stop()19if _session is not None:20steel.sessions.release(_session.id)212223if __name__ == "__main__":24asyncio.run(main())
Run It
python main.py
Swap the model
gpt-5-mini is the default here because it's fast enough for interactive iteration. Swap up to gpt-5 when you need higher-quality reasoning on harder pages — expect 15-40s per turn because of its reasoning stage.
agent = Agent(..., model="gpt-5") # slower, better reasoning
Next Steps
-
OpenAI Agents SDK (Python): https://openai.github.io/openai-agents-python/
-
TypeScript quickstart: /integrations/openai-agents-sdk/quickstart-node
-
Steel Sessions API: /overview/sessions-api/overview
-
This example on GitHub: https://github.com/steel-dev/steel-cookbook/tree/main/examples/steel-openai-agents-python-starter