Control a browser with Notte's reasoning engine
Control browsers with AI using Steel's infrastructure and Notte's reasoning engine.
Scaffolds a starter project locally. Requires the Steel CLI.
Notte builds its agent on top of a perception layer. Each step, notte.Session flattens the live DOM into a compact action space (labeled interactive elements, form fields, section headings) and hands that structured view to the reasoning model. The model picks an action by id, Notte translates it back into a browser command.
This recipe points that loop at a Steel session instead of a locally-launched browser.
with notte.Session(cdp_url=cdp_url) as notte_session:agent = notte.Agent(session=notte_session,max_steps=5,reasoning_model="gemini/gemini-2.5-flash",)response = agent.run(task=TASK)
notte.Session(cdp_url=...) is the integration surface. The default perception_type is "fast" (heuristic parser); pass perception_type="deep" on pages where the fast path misses elements.
max_steps caps iterations. The starter uses 5; sign-in / filter / extract flows typically want 15 to 30. The agent exits early when it marks the task complete.
Run it
cd examples/nottecp .env.example .env # set STEEL_API_KEY and GEMINI_API_KEYuv run main.py
Get keys from app.steel.dev and aistudio.google.com. The session viewer URL prints as the script starts.
Your output varies. Structure looks like this:
Steel + Notte Assistant============================================================Starting Steel browser session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34...Executing task: Go to Wikipedia and search for machine learning========================================================================================================================TASK EXECUTION COMPLETED============================================================Duration: 24.3 secondsTask: Go to Wikipedia and search for machine learningResult:Machine learning is a field of artificial intelligence...============================================================Releasing Steel session...Session completed. View replay at https://app.steel.dev/sessions/ab12cd34...Done!
A default run takes ~25 seconds. The finally block calls client.sessions.release(session.id).
Make it yours
- Change the task. Set
TASKin.envor edit the default inmain.py. - Raise
max_steps. Bump the ceiling onnotte.Agent(...)for multi-page flows. - Swap the reasoning model. Change
reasoning_modelonnotte.Agent. Flash for speed, GPT-5 or Sonnet for ambiguity. - Switch to deep perception. Pass
perception_type="deep"tonotte.Session(...)when the fast heuristics miss elements. - Turn on stealth. Add
use_proxy=True,solve_captcha=True, orsession_timeout=1800000toclient.sessions.create()for sites with anti-bot.
Related
Related recipes
Deep research with Claude Agent SDK subagents
Lead orchestrator dispatches parallel researcher subagents, each driving its own Steel browser, and synthesizes findings into a cited Markdown report.
Build a browser agent with the Claude Agent SDK
Use Steel with the Claude Agent SDK (TypeScript) to build a tool-using browser agent on Anthropic's first-party agent loop.
Build a typed browser agent with Pydantic AI
Use Steel with Pydantic AI to build typed, provider-agnostic browser agents with dependency injection.