# Drive a browser with Claude Computer Use
URL: /cookbook/claude-computer-use

---
title: Drive a browser with Claude Computer Use
description: Connect Claude to a Steel browser session for autonomous web interactions.
---

<RecipeJsonLd slug="claude-computer-use" title={"Drive a browser with Claude Computer Use"} description={"Connect Claude to a Steel browser session for autonomous web interactions."} authors={[{"handle":"junhsss","name":"Jun Ryu"}, {"handle":"hussufo","name":"Hussien Hussien"}]} datePublished="2025-07-16" dateModified="2026-04-24" sourceUrl="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/claude-computer-use-ts" />

<Tabs items={['TypeScript', 'Python']} groupId="lang" persist updateAnchor className="cookbook-concept-tabs">

<Tab id="typescript" className="cookbook-concept-tab">

<RecipeMeta href="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/claude-computer-use-ts" path="examples/claude-computer-use-ts" authors={[{"handle":"junhsss","name":"Jun Ryu","avatar":"https://github.com/junhsss.png?size=40"}]} updated="2026-04-24" />

<RecipeQuickstart slug="claude-computer-use-ts" />

Claude sees the screen as an image and returns concrete actions at pixel coordinates: `left_click [640, 412]`, `type "claude 4.7 opus"`, `scroll down 3`. Something has to execute those actions against a real browser and send the next screenshot back. That "something" is the agent loop in `index.ts`, and the browser is a Steel session.

## The loop

The whole thing fits in one `while` block inside `Agent.executeTask`. Each iteration sends the growing message history plus the `computer` tool definition to Claude:

```typescript
const response = await this.client.beta.messages.create({
  model: this.model,
  max_tokens: 4096,
  messages: this.messages,
  tools: this.tools,
  betas: ["computer-use-2025-11-24"],
});
```

The tool definition declares `computer_20251124` with the viewport's `display_width_px` and `display_height_px`. Keep it consistent with the Steel session's `dimensions` (1280x768 here) or clicks land in the wrong place.

`executeComputerAction` is the translation layer. Claude emits computer-use actions (`left_click`, `type`, `key`, `scroll`, `screenshot`, ...); Steel's Input API speaks a parallel vocabulary (`click_mouse`, `type_text`, `press_key`, `scroll`, `take_screenshot`):

```typescript
case "left_click":
case "right_click":
case "middle_click":
case "double_click":
case "triple_click": {
  body = {
    action: "click_mouse",
    button: buttonMap[action],
    coordinates: coords,
    screenshot: true,
  };
  break;
}
```

Every action sets `screenshot: true`, so Steel returns a fresh base64 PNG after each interaction. That PNG becomes the content of a `tool_result` block in the next user message.

A few translation details:

- **Keys get normalized.** `normalizeKey` maps synonyms (`CTRL` to `Control`, `CMD` to `Meta`, `ENTER` to `Enter`) before sending to Steel.
- **Scroll is delta-based.** Claude says `scroll_direction: "down", scroll_amount: 3`; Steel expects `delta_x`/`delta_y` in pixels. The code multiplies by 100 per step.
- **Drags default from center.** `left_click_drag` only gives an end coordinate, so the start is the viewport center.

## Stop conditions

- **No tool calls.** Claude wrote only text. Task is complete.
- **Repetition.** `detectRepetition` compares the last assistant message against the previous three by word overlap (>80%).
- **Iteration cap.** 50 iterations by default.

The `finally` block in `main` always calls `agent.cleanup()`, which releases the Steel session.

## Run it

```bash
cd examples/claude-computer-use-ts
cp .env.example .env          # set STEEL_API_KEY and ANTHROPIC_API_KEY
npm install
npm start
```

Get keys from [app.steel.dev](https://app.steel.dev/settings/api-keys) and [console.anthropic.com](https://console.anthropic.com/). Override the task inline:

```bash
TASK="Find the current weather in New York City" npm start
```

Your output varies. Structure looks like this:

```text
Steel Session created successfully!
View live session at: https://app.steel.dev/sessions/ab12cd34...

Executing task: Go to Steel.dev and find the latest news
============================================================
I'll navigate to Steel.dev and look for the latest news.
computer({"action":"screenshot"})
computer({"action":"left_click","coordinate":[640,48]})
computer({"action":"type","text":"https://steel.dev"})
computer({"action":"key","text":"Return"})
...
Task complete - no further actions requested

TASK EXECUTION COMPLETED
Duration: 84.3 seconds
Result: Steel's latest news includes ...
```

Expect ~60-120 seconds and 15-40 iterations for a simple browsing task.

## Make it yours

- **Change the viewport.** `viewportWidth` and `viewportHeight` in the `Agent` constructor set both the Steel session dimensions and the tool definition's `display_width_px`/`display_height_px`. Keep them in sync.
- **Tune the system prompt.** `BROWSER_SYSTEM_PROMPT` is where the browsing conventions live: date injection, screenshot-after-submit rule, black-screen recovery.
- **Raise the ceiling.** Long tasks bump against the 50-iteration default in `executeTask`.
- **Hand off auth.** Pair this recipe with Steel's [credentials](/cookbook/credentials) or [auth contexts](/cookbook/auth-context) to start the session authenticated.

## Related

[Computer use docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool) · [Python version](/cookbook/claude-computer-use) · [Mobile variant](/cookbook/claude-computer-use-mobile)

</Tab>

<Tab id="python" className="cookbook-concept-tab">

<RecipeMeta href="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/claude-computer-use-py" path="examples/claude-computer-use-py" authors={[{"handle":"hussufo","name":"Hussien Hussien","avatar":"https://github.com/hussufo.png?size=40"}, {"handle":"junhsss","name":"Jun Ryu","avatar":"https://github.com/junhsss.png?size=40"}]} updated="2026-04-24" />

<RecipeQuickstart slug="claude-computer-use-py" />

Computer use is Anthropic's primitive for giving Claude direct control of a screen. You declare a `computer` tool with a viewport size; Claude replies with actions like `left_click` at `(x, y)`, `type` with text, `scroll`, `key`. You execute each one and hand back a screenshot.

Steel supplies the screen. A Steel session is a headful Chromium in a VM reachable over HTTPS, and the Input API (`sessions.computer`) executes mouse and keyboard actions and returns a PNG in the same call.

## The loop

Everything in `main.py` hangs off a single loop in `Agent.execute_task`. Seed the conversation with a system prompt and the task, then on each turn:

```python
response = self.client.beta.messages.create(
    model=self.model,
    max_tokens=4096,
    messages=self.messages,
    tools=self.tools,
    betas=["computer-use-2025-11-24"],
)

text, has_actions = self.process_response(response)

if not has_actions:
    break
```

`tools` declares the computer tool Claude is allowed to call:

```python
self.tools = [
    {
        "type": "computer_20251124",
        "name": "computer",
        "display_width_px": self.viewport_width,
        "display_height_px": self.viewport_height,
        "display_number": 1,
    }
]
```

The viewport (1280x768) has to match what Steel renders or clicks land in the wrong place.

`tool_use` blocks go to `execute_computer_action`, which maps each Anthropic action name onto a Steel Input API call:

```python
elif action in ("left_click", "right_click", "middle_click",
                "double_click", "triple_click"):
    body = {
        "action": "click_mouse",
        "button": button_map[action],
        "coordinates": [coords[0], coords[1]],
        "screenshot": True,
    }
```

`screenshot: True` tells Steel to attach a base64 PNG to the response, so a click and the screenshot that proves it landed are one round-trip. The PNG goes back into `messages` as a `tool_result` with the matching `tool_use_id`.

Two normalization details: `key` / `hold_key` run names like `CTRL+A` through `normalize_key` (`CTRL` to `Control`, `ESC` to `Escape`, `UP` to `ArrowUp`), and `scroll_amount` is multiplied by 100 pixels per step.

Two things end the loop: Claude responds with only text (task done), or the last two assistant messages overlap 80%+ on word content (`detect_repetition`). A hard cap of 50 iterations catches anything that slips past both.

## Run it

```bash
cd examples/claude-computer-use-py
cp .env.example .env          # set STEEL_API_KEY and ANTHROPIC_API_KEY
uv run main.py
```

Get keys from [app.steel.dev](https://app.steel.dev/settings/api-keys) and [console.anthropic.com](https://console.anthropic.com/). Default task lives in `.env` as `TASK`; you can override per-run:

```bash
TASK="Find the current weather in New York City" python main.py
```

Your output varies. Structure looks like this:

```text
Starting Steel session...
Steel Session created successfully!
View live session at: https://app.steel.dev/sessions/ab12cd34…

Executing task: Go to Steel.dev and find the latest news
============================================================
I'll navigate to Steel.dev and look for the latest news.
computer({"action": "key", "text": "ctrl+l"})
computer({"action": "type", "text": "https://steel.dev"})
computer({"action": "key", "text": "Return"})
computer({"action": "screenshot"})
…
Task complete - no further actions requested

TASK EXECUTION COMPLETED
Duration: 74.3 seconds
Result: Steel just shipped …

Releasing Steel session...
```

A run typically takes 60-180 seconds and 10-30 loop iterations.

## Make it yours

- **Change the task.** Edit `TASK` in `.env` or pass it per-run.
- **Tune the viewport.** `viewport_width` / `viewport_height` in `Agent.__init__`.
- **Rework the system prompt.** `BROWSER_SYSTEM_PROMPT` is where site-specific knowledge lives.
- **Persist a login.** Pass `session_context` to `sessions.create` to resume with cookies and local storage. See [credentials](/cookbook/credentials).
- **Raise the ceiling.** `max_iterations=50` in `execute_task` is the safety net.

## Related

[Anthropic computer use docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool) · [TypeScript version](/cookbook/claude-computer-use)

</Tab>

</Tabs>

## Related recipes

<RecipeGrid>
<RecipeCard slug="gemini-computer-use" title={"Drive a browser with Gemini Computer Use"} description={"Connect Google's Gemini Computer Use to a Steel browser session for autonomous web interactions."} topics={['Computer use']} date="2025-11-25" />
<RecipeCard slug="claude-computer-use-mobile" title={"Drive a mobile browser with Claude Computer Use"} description={"Claude Computer Use with Steel for autonomous task execution in mobile browser environments."} topics={['Computer use', 'Mobile']} date="2025-10-14" />
<RecipeCard slug="openai-computer-use" title={"Drive a browser with OpenAI Computer Use"} description={"Connect OpenAI's Computer Use Assistant to a Steel browser session for autonomous web interactions."} topics={['Computer use']} date="2025-03-19" />
</RecipeGrid>
