Solve CAPTCHAs automatically in a Browser Use agent
Build an AI agent with browser-use and Steel that solves CAPTCHAs automatically.
Steel can solve CAPTCHAs inside a session without the agent lifting a finger. This recipe wires that into Browser Use: one flag at session creation, one custom tool the agent calls to block until the solver reports done. Everything else is the same perception-plan-act loop as the base Browser Use recipe.
session = client.sessions.create(solve_captcha=True)
solve_captcha=True turns on Steel's solver in the session's browser. Steel detects reCAPTCHA, hCaptcha, and similar widgets as the page loads them, drives the challenge transparently, and exposes progress through a status endpoint. From Browser Use's side, nothing changes: it still connects via BrowserSession(cdp_url=...) and runs the same agent loop. The solver lives below the CDP layer, invisible to the framework.
The only hand-off the agent needs is a way to wait. That's a tool registered with Browser Use's Tools decorator:
tools = Tools()@tools.action(description="Wait for CAPTCHA to be solved by Steel. ...")async def wait_for_captcha_solution() -> str:...
The description is what the LLM reads when deciding to call the tool, so it spells out the trigger: "Call this tool when you encounter any CAPTCHA challenge." The body polls client.sessions.captchas.status(session_id) once per second and returns once no page is still solving.
wait_for_captcha_solution runs with a 60-second deadline. Each tick fetches CAPTCHA state for every open tab and calls _has_active_captcha, which checks whether any state still has isSolvingCaptcha=True. Once that flips false, _summarize_states collapses per-page task counts (total, solving, solved, failed) into one dict and the tool returns a success string like "All CAPTCHAs have been solved after 4120ms. ..." so the agent has something to reason about on its next step.
The TASK string closes the loop. It tells the agent to navigate, call wait_for_captcha_solution when a challenge appears, then submit. Without that instruction the agent would try to click the checkbox itself and race the auto-solver.
Run it
cd examples/browser-use-captcha-autocp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run main.py
Keys from app.steel.dev and platform.openai.com. The default TASK points at a public reCAPTCHA v2 checkbox demo so you can verify the flow end-to-end without touching a real target.
A session viewer URL prints as the script starts. Open it in another tab to watch Steel click the checkbox while the agent idles inside wait_for_captcha_solution.
Your output varies. Structure looks like this:
Starting Steel browser session...Steel Session created!View session at https://app.steel.dev/sessions/ab12cd34...Executing task:1. Navigate to https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php2. When you encounter a CAPTCHA ... call the wait_for_captcha_solution tool...============================================================INFO [Agent] Step 1: navigate to recaptcha demoINFO [Agent] Step 2: call wait_for_captcha_solutionWaiting for CAPTCHA to be solved...All CAPTCHAs solved in 4120msINFO [Agent] Step 3: click submitINFO [Agent] Step 4: done============================================================TASK EXECUTION COMPLETEDReleasing Steel session...Session completed. View replay at https://app.steel.dev/sessions/ab12cd34...
A run usually finishes in under a minute: a few cents of Steel session time plus OpenAI tokens for each agent step. The finally block that calls client.sessions.release() still matters. Steel bills per session-minute and CAPTCHA-enabled sessions count the same, so skipping release keeps the browser running until the default 5-minute timeout.
Reading the status endpoint
client.sessions.captchas.status(session_id) returns one entry per open page. Each entry carries:
pageIdandurlfor the tab.isSolvingCaptcha: true while Steel is actively working on that page.tasks: per-challenge status strings likesolving,solved,failed_to_detect,failed_to_solve.
_summarize_states rolls those per-page counts into totals so the agent sees a compact summary instead of raw page objects. The only invariant the tool itself depends on is _has_active_captcha: return once no page is still solving. If you want richer logic (fail early on failed_to_solve, surface the specific page URL, give up after N failures), extend the summary helper.
Make it yours
- Point at a real target. Replace the
TASKstring with the actual flow: "sign up at example.com with these details". Keep the instruction to callwait_for_captcha_solutionwhen a CAPTCHA appears; everything else, the agent figures out. - Tune the poll loop.
wait_for_captcha_solutionusestimeout_ms=60000andpoll_interval_ms=1000. Image grids and audio fallbacks sometimes need 90-120 seconds. A 2-3 second poll cuts API calls without noticeable delay. - Fail loud on solver failure. The current summary counts
failedtasks but the tool does not short-circuit on them. Checksummary["failed_tasks"] > 0inside the loop and return an error string so the agent can retry or abort. - Combine with stealth.
sessions.create()acceptsuse_proxy=Trueandsession_timeout=1800000alongsidesolve_captcha=True. Sites that CAPTCHA you aggressively usually want all three.
Related
- Manual variant: trigger solve requests explicitly instead of letting Steel detect challenges on its own.
- Browser Use base: the minimal wiring without any CAPTCHA handling.
- Browser Use docs
Related recipes
Solve reCAPTCHA v2 manually with Browser Use
Manually solve reCAPTCHA v2 using Steel's CAPTCHA API with the browser-use framework.
Build a browser agent with Browser Use
Integrate Steel with the browser-use framework for AI-driven web automation.
Build a browser agent with the Claude Agent SDK
Use Steel with the Claude Agent SDK (TypeScript) to build a tool-using browser agent on Anthropic's first-party agent loop.