Overview

Automatically detect and solve CAPTCHAs in browser sessions using Steel's integrated captcha solvers and the CAPTCHAs API.

Steel's CAPTCHA system is designed to work seamlessly with browser automation workflows, automatically detecting and solving CAPTCHAs without interrupting your automation flow.

Steel's CAPTCHAs API provides a robust solution for handling CAPTCHAs that appear during your automations. The system uses a bridge architecture that connects browser sessions with our CAPTCHA-solving capabilities, enabling real-time detection, solving, and state management.

CAPTCHA solving is particularly useful for:

Scraping jobs that encounter CAPTCHA challenges
Browser workflows that need to submit forms or handle authentication flows
AI agents that need to navigate CAPTCHA-protected websites

How CAPTCHA Solving Works with the CAPTCHAs API

Steel's CAPTCHAs API operates through a bridge architecture that connects your browser sessions with our external CAPTCHA-solving capabilities. It helps with four key parts:

Detection: The system automatically detects when CAPTCHAs appear on pages
State Management: CAPTCHA states are tracked per page with real-time updates
Solving: CAPTCHAs are then solved by us using various methods
Completion: The system reports back when CAPTCHAs are solved or failed

Getting CAPTCHA Status

You can check the current CAPTCHA status for any session to understand what CAPTCHAs are active and their current solving progress.

curl -X GET https://api.steel.dev/v1/sessions/{sessionId}/captchas/status \   
-H "steel-api-key: YOUR_API_KEY_HERE"

import Steel from 'steel-sdk';

const client = new Steel();

const response = await client.sessions.captchas.status('sessionId');

console.log(response);

from steel import Steel

client = Steel()
response = client.sessions.captchas.status(
    "sessionId",
)
print(response)

Response Format

The status endpoint returns an array of current pages and their CAPTCHA states. An example output might look like:

[
   {
      "pageId":"page_12345",
      "url":"https://example.com/login",
      "isSolvingCaptcha":true,
      "tasks":[
         {
            "id":"task_67890",
            "type":"image_to_text",
            "status":"solving",
            "created":1640995200000,
            "totalDuration":5000
         }
      ],
      "created":1640995200000,
      "lastUpdated":1640995205000
   }
]

CAPTCHA Task Status

Tasks can have the following statuses:

undetected: CAPTCHA has not been detected
detected: CAPTCHA has been detected but solving hasn't started
solving: CAPTCHA is currently being solved
solved: CAPTCHA has been successfully solved
failed_to_detect: CAPTCHA detection failed
failed_to_solve: CAPTCHA solving failed

Solving Image CAPTCHAs

For image-based CAPTCHAs, you can provide XPath selectors to help the system locate and solve the CAPTCHA.

The url parameter is optional and defaults to the current page.

curl -X POST https://api.steel.dev/v1/sessions/{sessionId}/captchas/solve-image \
   -H "Content-Type: application/json" \
   -H "steel-api-key: YOUR_API_KEY_HERE" \
   -d '{     "imageXPath": "//img[@id=\"captcha-image\"]",     "inputXPath": "//input[@name=\"captcha\"]",     "url": "https://example.com/login"   }'

import Steel from 'steel-sdk';

const client = new Steel();

const response = await client.sessions.captchas.solveImage('sessionId', {
  imageXPath: '//img[@id="captcha-image"]',
  inputXPath: '//input[@name="captcha"]',
});

console.log(response.success);

from steel import Steel

client = Steel()
response = client.sessions.captchas.solve_image(
    session_id=session.id,
    image_x_path='//img[@id="captcha-image"]',
    input_x_path='//input[@name="captcha"]',
)
print(response.success)

Parameters

imageXPath (required): XPath selector for the CAPTCHA image element
inputXPath (required): XPath selector for the CAPTCHA input field
url (optional): URL where the CAPTCHA is located (defaults to current page)

Response

{
	"success": true,
	"message": "Image captcha solve request sent"
}

WebSocket Bridge

The CAPTCHA bridge uses WebSocket connections to maintain real-time communication between browser sessions and CAPTCHA-solving extensions. This enables:

Real-time state updates: Immediate notification when CAPTCHAs are detected or solved
Bidirectional communication: Extensions can send updates and receive solve requests
Persistent connections: Maintains connection throughout the session lifecycle

State Management

The CAPTCHA bridge uses intelligent state management to handle complex scenarios:

Page-Based Tracking

States are tracked by pageId rather than URL to avoid duplicates and handle dynamic URLs effectively.

Task Merging

When multiple updates occur for the same CAPTCHA task, the system intelligently merges the information, preserving important details like:

Creation and detection timestamps
Solving duration calculations
Status progression

Duration Calculation

The system automatically calculates task durations based on:

created or detectedTime: When the CAPTCHA was first detected
solveTime or failureTime: When the CAPTCHA was solved or failed
Real-time updates during the solving process

Integrating with Existing Automations

Steel's CAPTCHA system is designed to work seamlessly with your existing automations using Playwright/Puppeteer:

Monitoring CAPTCHA Progress

async function waitForCaptchaSolution(sessionId, timeout = 30000) {
  const startTime = Date.now();
  
  while (Date.now() - startTime < timeout) {
    const status = await getCaptchaStatus(sessionId);
    
    const activeCaptchas = status.filter(state => state.isSolvingCaptcha);
    
    if (activeCaptchas.length === 0) {
      console.log('All CAPTCHAs solved!');
      return true;
    }
    
    // Log progress
    activeCaptchas.forEach(captcha => {
      console.log(`CAPTCHA on ${captcha.url}: ${captcha.tasks.length} tasks`);
    });
    
    await new Promise(resolve => setTimeout(resolve, 1000));
  }
  
  throw new Error('CAPTCHA solving timeout');
}

Basic Integration Pattern

// Navigate to a page that might have CAPTCHAs
await page.goto('https://example.com/protected-page');

// Check if CAPTCHAs are present
const captchaStatus = await checkCaptchaStatus(sessionId);

if (captchaStatus.some(state => state.isSolvingCaptcha)) {
  // Wait for CAPTCHA to be solved
  await waitForCaptchaSolution(sessionId);
}

// Continue with automation
await page.click('#submit-button');

Handling Different CAPTCHA Types

The CAPTCHA bridge automatically handles most common CAPTCHA types. For image CAPTCHAs, you can use the image solving endpoint with specific XPath selectors.

The captcha types for each task are mapped to the CAPTCHA types we support like so:

recaptchaV2: Google's reCAPTCHA v2 with "I'm not a robot" checkbox and image challenges
recaptchaV3: Google's reCAPTCHA v3 with invisible background scoring and risk analysis
turnstile: Cloudflare Turnstile with minimal user interaction verification
image_to_text: Traditional text-based CAPTCHA requiring OCR of distorted characters

Best Practices

Monitor State Changes: Regularly check CAPTCHA status during automation
Handle Timeouts: Set reasonable timeouts for automatic CAPTCHA solving operations
Use Specific Selectors: Provide accurate XPath selectors for image CAPTCHAs
Error Handling: Implement proper error handling for failed CAPTCHA attempts
Logging: Log CAPTCHA events for debugging and monitoring

The CAPTCHA system is designed to be as transparent as possible to your automation workflows, handling the complexity of CAPTCHA detection and solving while providing you with the control and visibility you need.