Solve reCAPTCHA v2 manually with Browser Use
Manually solve reCAPTCHA v2 using Steel's CAPTCHA API with the browser-use framework.
Steel can solve CAPTCHAs for you in the background, or it can hand you the status API and let you drive. This recipe picks the second path. The session is created with auto-solving explicitly off, a custom Browser Use tool polls client.sessions.captchas.status(), and it calls client.sessions.captchas.solve() only for the CAPTCHA type you care about. Every state transition (detected, solving, validating, solved) is yours to read and react to.
session = client.sessions.create(timeout=300000,solve_captcha=True,stealth_config={"auto_captcha_solving": False},)
solve_captcha=True turns on Steel's CAPTCHA subsystem so the status endpoint has data to return. auto_captcha_solving: False tells Steel not to act on what it sees. Detection without intervention. You pick up the loop from there.
The default task opens two tabs, a reCAPTCHA v2 demo and a reCAPTCHA v3 demo, then delegates to a tool registered on the agent (solve_recaptcha_v2_manual). The tool polls until it finds work, requests a solve for v2 tasks only, and returns once the session reports it is no longer solving. The agent then clicks Submit and reads the result off the page.
The polling loop
solve_recaptcha_v2_manual is registered with Browser Use via @tools.action(...) and runs in a 60-attempt, 3-second-interval loop (MAX_POLL_ATTEMPTS, POLL_INTERVAL_SECS). Each tick calls the status endpoint, iterates every page in the response, and dispatches on task.status:
status_response = client.sessions.captchas.status(session_id)states = [s.to_dict() if hasattr(s, "to_dict") else dict(s) for s in status_response]for page_data in states:for task in page_data.get("tasks") or []:task_id = task.get("id", "")task_status = task.get("status", "")...
A task with status == "detected" and type == "recaptchaV2" triggers a solve request:
if task_status == "detected" and task_id not in solve_requested:if task.get("type") == SOLVE_CAPTCHA_TYPE: # "recaptchaV2"client.sessions.captchas.solve(session_id, task_id=task_id)solve_requested.add(task_id)
Other types (recaptchaV3, turnstile, image_to_text) are logged and skipped. To solve every detected CAPTCHA regardless of type, drop the task_id arg: client.sessions.captchas.solve(session_id). solve_requested is a set, so each task gets one request even as the poll loop revisits it.
When is a solve actually done
solved is not the finish line. Steel marks a task validating after the answer is submitted so it can watch the site's response for a few seconds and confirm the solve was not rejected. The reliable signal for "stop polling" is the per-page isSolvingCaptcha flag:
has_active_recaptcha_v2 = any(task.get("id") in detected_recaptcha_v2and task.get("status") not in ("detected", "undetected")for task in page_tasks)if has_active_recaptcha_v2:if not page_data.get("isSolvingCaptcha", False):recaptcha_pages_done = Trueelse:all_pages_checked = Falsebreak
The tool tracks reCAPTCHA v2 task IDs in detected_recaptcha_v2, then for each page that holds one of those tasks past the detected state, waits for isSolvingCaptcha to flip to False. When every relevant page reports quiet, the tool returns a success string to the agent.
Run it
cd examples/browser-use-captcha-manualcp .env.example .env # set STEEL_API_KEY and OPENAI_API_KEYuv run main.py
Keys from app.steel.dev and platform.openai.com. The session viewer URL prints as the script starts. Open it in another tab to watch the reCAPTCHA checkbox tick over in real time.
Your output varies. Structure looks like this:
Creating Steel session with CAPTCHA solving enabled...Session created!Session ID: ab12cd34...Viewer: https://app.steel.dev/sessions/ab12cd34...Task: Open 2 CAPTCHA pages, solve reCAPTCHA v2 only============================================================INFO [Agent] Step 1: open reCAPTCHA v2 and v3 demo tabsINFO [Agent] Step 2: call solve_recaptcha_v2_manualStarting manual reCAPTCHA v2 solve polling...Max attempts: 60 | Interval: 3.0sPoll attempt 1/60Page: https://www.google.com/recaptcha/api2/demoTask status: detectedreCAPTCHA v2 detected (type=recaptchaV2)! Requesting solve...Page: https://2captcha.com/demo/recaptcha-v3Task status: detectedNon-reCAPTCHA v2 task (type=recaptchaV3, ...), skipping.Poll attempt 3/60Task status: solvingCAPTCHA is being solved...Poll attempt 5/60Task status: validatingCAPTCHA is being validated...reCAPTCHA v2 solved! (1 task(s) in 18.4s)INFO [Agent] Step 3: click SubmitINFO [Agent] Step 4: done============================================================TASK EXECUTION COMPLETED
A run takes ~60 seconds and costs Steel session time plus OpenAI tokens for each agent step. The finally block that calls client.sessions.release() isn't optional. Without it the browser stays up until the 5-minute timeout, whether the solve finished or not.
Make it yours
- Solve a different CAPTCHA type. Change
SOLVE_CAPTCHA_TYPEto"recaptchaV3","turnstile", or"image_to_text". The dispatch insolve_recaptcha_v2_manualalready filters bytask.get("type"), so the rest of the loop is type-agnostic. - Solve everything. Replace
client.sessions.captchas.solve(session_id, task_id=task_id)withclient.sessions.captchas.solve(session_id)to solve every detected task regardless of type. Drop the type filter and thedetected_recaptcha_v2set at the same time. - Retune the loop.
MAX_POLL_ATTEMPTSandPOLL_INTERVAL_SECSgate how long the tool will wait. 60 x 3s (3 minutes) is generous for a single solve. Shorten both for smoke tests, or stretchMAX_POLL_ATTEMPTSfor pages that queue many challenges. - Swap the target. Replace the entries in
CAPTCHA_PAGES. The agent builds its tab list and prompt from that array, so the tool will poll and solve whatever you point it at.
Related
- Auto variant: flip
solve_captcha: Trueand let Steel detect, solve, and submit without any tool plumbing. - Browser Use base: base recipe without CAPTCHA handling.
- Browser Use docs
Related recipes
Solve CAPTCHAs automatically in a Browser Use agent
Build an AI agent with browser-use and Steel that solves CAPTCHAs automatically.
Build a browser agent with Browser Use
Integrate Steel with the browser-use framework for AI-driven web automation.
Build a browser agent with the Claude Agent SDK
Use Steel with the Claude Agent SDK (TypeScript) to build a tool-using browser agent on Anthropic's first-party agent loop.