# Drive a browser with OpenAI Computer Use
URL: /cookbook/openai-computer-use

---
title: Drive a browser with OpenAI Computer Use
description: "Connect OpenAI's Computer Use Assistant to a Steel browser session for autonomous web interactions."
---

<RecipeJsonLd slug="openai-computer-use" title={"Drive a browser with OpenAI Computer Use"} description={"Connect OpenAI's Computer Use Assistant to a Steel browser session for autonomous web interactions."} authors={[{"handle":"junhsss","name":"Jun Ryu"}, {"handle":"nibzard","name":"Nikola Balic"}]} datePublished="2025-03-19" dateModified="2026-04-24" sourceUrl="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/openai-computer-use-ts" />

<Tabs items={['TypeScript', 'Python']} groupId="lang" persist updateAnchor className="cookbook-concept-tabs">

<Tab id="typescript" className="cookbook-concept-tab">

<RecipeMeta href="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/openai-computer-use-ts" path="examples/openai-computer-use-ts" authors={[{"handle":"junhsss","name":"Jun Ryu","avatar":"https://github.com/junhsss.png?size=40"}, {"handle":"nibzard","name":"Nikola Balic","avatar":"https://github.com/nibzard.png?size=40"}]} updated="2026-04-24" />

<RecipeQuickstart slug="openai-computer-use-ts" />

OpenAI's computer-use model ships as a single tool declaration: `{ type: "computer" }`. You hand it to the Responses API, send a screenshot, and the model returns `computer_call` items with actions like `click`, `type`, `keypress`, `scroll`. Your job is to execute them against a real browser, capture the next screenshot, and feed it back.

## The loop

The Responses API threads conversation state server-side, so each turn carries only the new tool outputs plus a `previous_response_id`:

```typescript
const response = await createResponse({
  model: this.model,
  instructions: this.systemPrompt,
  input: nextInput,
  tools: this.tools,
  previous_response_id: previousResponseId,
  reasoning: { effort: "medium" },
  truncation: "auto",
});
```

First turn: `nextInput` is `[{ role: "user", content: task }]`. Subsequent turns: `nextInput` is just the array of tool outputs from the previous iteration.

The response's `output` array mixes three item types:

- `message`: a plain text reply. The final message becomes the return value.
- `reasoning`: the model's own summary; printed but not fed back.
- `computer_call`: one or more actions to execute. Each call has a `call_id` that must be echoed back in the matching `computer_call_output`.

## Actions in, screenshots out

`executeComputerAction` is the translation layer:

```typescript
case "click": {
  const coords = this.toCoords(actionArgs.x, actionArgs.y);
  const button = this.mapButton(actionArgs.button);
  const clicks = this.toNumber(actionArgs.num_clicks, 1);
  body = {
    action: "click_mouse",
    button,
    coordinates: coords,
    ...(clicks > 1 ? { num_clicks: clicks } : {}),
    screenshot: true,
  };
  break;
}
```

The screenshot goes back as a `computer_call_output`, matched by `call_id`:

```typescript
toolOutputs.push({
  type: "computer_call_output",
  call_id: item.call_id,
  acknowledged_safety_checks: pendingChecks,
  output: {
    type: "computer_screenshot",
    image_url: `data:image/png;base64,${screenshotBase64}`,
  },
});
```

A few translation details:

- **`keypress` takes a list.** `normalizeKey` rewrites synonyms (`CTRL` to `Control`, `CMD` to `Meta`, `ENTER` to `Enter`).
- **`scroll` is delta-based.** OpenAI sends `scroll_x`/`scroll_y` in pixels. Steel's `scroll` takes `delta_x`/`delta_y` directly.
- **`drag` gives a path.** OpenAI provides the full point list in `path`; Steel's `drag_mouse` wants the same shape.
- **Unknown actions fall through to `take_screenshot`.**

## Safety checks

A `computer_call` can attach `pending_safety_checks` when the planned action looks sensitive. The call won't take effect until you echo those check IDs back in `acknowledged_safety_checks`. The starter auto-acknowledges everything; for production, flip `autoAcknowledgeSafety` to `false` and gate each check on a human approval.

## Run it

```bash
cd examples/openai-computer-use-ts
cp .env.example .env          # set STEEL_API_KEY and OPENAI_API_KEY
npm install
npm start
```

Get keys from [app.steel.dev](https://app.steel.dev/settings/api-keys) and [platform.openai.com](https://platform.openai.com/api-keys). Override the task inline:

```bash
TASK="Find the current weather in New York City" npm start
```

Your output varies. Structure looks like this:

```text
Steel Session created successfully!
View live session at: https://app.steel.dev/sessions/ab12cd34...

Executing task: Go to Steel.dev and find the latest news
============================================================
I'll navigate to steel.dev and scan the landing page.
click({"x":720,"y":48})
type({"text":"https://steel.dev"})
keypress({"keys":["Enter"]})
scroll({"x":720,"y":450,"scroll_y":600})
Steel's latest release adds ...

============================================================
TASK EXECUTION COMPLETED
============================================================
Duration: 71.4 seconds
```

Expect roughly 60-120 seconds and 15-40 turns for a simple browsing task.

## Make it yours

- **Change the viewport.** `viewportWidth` and `viewportHeight` in the `Agent` constructor set the Steel session `dimensions`.
- **Swap the model.** The default is `gpt-5.5`. Update `this.model` in the `Agent` constructor.
- **Tune reasoning effort.** `reasoning: { effort: "medium" }` trades latency for planning quality.
- **Rewrite the system prompt.** `BROWSER_SYSTEM_PROMPT` holds the browsing conventions.
- **Persist a login.** Pass `sessionContext` to `sessions.create`. See [credentials](/cookbook/credentials) and [auth-context](/cookbook/auth-context).
- **Turn off auto-ack.** Flip `autoAcknowledgeSafety` to `false` to make pending safety checks raise.

## Related

[Computer use guide](https://platform.openai.com/docs/guides/tools-computer-use) · [Python version](/cookbook/openai-computer-use) · [Anthropic equivalent](/cookbook/claude-computer-use)

</Tab>

<Tab id="python" className="cookbook-concept-tab">

<RecipeMeta href="https://github.com/steel-dev/steel-cookbook/tree/92f29742253e2b6c6801d109e18232768e5291a0/examples/openai-computer-use-py" path="examples/openai-computer-use-py" authors={[{"handle":"junhsss","name":"Jun Ryu","avatar":"https://github.com/junhsss.png?size=40"}, {"handle":"nibzard","name":"Nikola Balic","avatar":"https://github.com/nibzard.png?size=40"}]} updated="2026-04-24" />

<RecipeQuickstart slug="openai-computer-use-py" />

OpenAI's Computer Use models expose one tool (`{"type": "computer"}`) and emit `computer_call` items containing an `action` the model wants performed on a screen. You execute the action, return a screenshot as a `computer_call_output`, and the next turn the model sees the result. The action vocabulary (`click`, `type`, `keypress`, `scroll`, `drag`, `wait`, `screenshot`) is fixed by OpenAI.

This recipe uses OpenAI's **Responses API**, not Chat Completions. Responses keeps conversation state on OpenAI's side via `previous_response_id`, so each turn only sends the new tool outputs rather than the full screenshot history.

## The loop

```python
params = {
    "model": self.model,
    "instructions": self.system_prompt,
    "input": next_input,
    "tools": self.tools,
    "reasoning": {"effort": "medium"},
    "truncation": "auto",
}
if previous_response_id:
    params["previous_response_id"] = previous_response_id

response = create_response(**params)
previous_response_id = response.get("id")
```

Each `response["output"]` is a list of items with a `type`. The loop walks them:

- `reasoning`: model's internal thinking, printed.
- `message`: terminal prose; the agent stores the last one as the final result.
- `computer_call`: one or more actions to execute.

`execute_computer_action` maps OpenAI's action vocabulary onto Steel's Input API. Each branch builds a Steel request body and sends it through `self.steel.sessions.computer(...)` with `screenshot: True`:

```python
elif action_type in ("click",):
    coords = self.to_coords(action_args.get("x"), action_args.get("y"))
    button = self.map_button(action_args.get("button"))
    num_clicks = int(self.to_number(action_args.get("num_clicks"), 1))
    payload = {
        "action": "click_mouse",
        "button": button,
        "coordinates": [coords[0], coords[1]],
        "screenshot": True,
    }
    if num_clicks > 1:
        payload["num_clicks"] = num_clicks
    body = payload
```

`keypress` arrives with OpenAI names (`CTRL`, `ENTER`, `ESC`, `UP`); `normalize_key` rewrites them into the Steel / DOM vocabulary (`Control`, `Enter`, `Escape`, `ArrowUp`).

The screenshot goes back as a `computer_call_output`:

```python
tool_outputs.append({
    "type": "computer_call_output",
    "call_id": item["call_id"],
    "acknowledged_safety_checks": pending_checks,
    "output": {
        "type": "computer_screenshot",
        "image_url": f"data:image/png;base64,{screenshot_base64}",
    },
})
```

## Safety checks

A `computer_call` can include `pending_safety_checks`. You must echo them back in `acknowledged_safety_checks` on the next turn, or the model stalls. The default here is `auto_acknowledge_safety = True`, which suits a starter but is not what you want in production. Flip it to `False` and surface the check to a human before proceeding.

## Run it

```bash
cd examples/openai-computer-use-py
cp .env.example .env          # set STEEL_API_KEY and OPENAI_API_KEY
uv run main.py
```

Get keys from [app.steel.dev](https://app.steel.dev/settings/api-keys) and [platform.openai.com](https://platform.openai.com/api-keys).

Override the task per run:

```bash
TASK="Find the current weather in New York City" python main.py
```

Your output varies. Structure looks like this:

```text
Starting Steel session...
Steel Session created successfully!
View live session at: https://app.steel.dev/sessions/ab12cd34…

Executing task: Go to Steel.dev and find the latest news
============================================================
I'll open steel.dev and check the blog.
keypress({"keys": ["CTRL", "L"]})
type({"text": "https://steel.dev"})
keypress({"keys": ["ENTER"]})
wait({"ms": 1500})
…
Steel's latest release notes mention …

TASK EXECUTION COMPLETED
Duration: 62.8 seconds
```

A run typically takes 60-180 seconds and 10-30 iterations. Screenshots are cached between turns via `previous_response_id`, so per-turn input cost stays roughly flat even on long loops. The `finally` block in `main()` calls `sessions.release()`.

## Make it yours

- **Change the task.** Edit `TASK` in `.env` or pass it inline.
- **Swap the model.** The default is `gpt-5.5`. Update `self.model` in `Agent.__init__`.
- **Tune the viewport.** `viewport_width` / `viewport_height` in `Agent.__init__` flow into `sessions.create(dimensions=...)`.
- **Turn off auto-ack.** Flip `auto_acknowledge_safety = False` to make pending safety checks raise.
- **Persist a login.** Pass `session_context` to `sessions.create`. See [credentials](/cookbook/credentials).
- **Adjust reasoning.** `"effort": "medium"` trades latency for deeper plans. Drop to `"low"` for fast lookups, raise to `"high"` for multi-step research.

## Related

[TypeScript version](/cookbook/openai-computer-use) · [Claude version](/cookbook/claude-computer-use) · [OpenAI Computer Use guide](https://platform.openai.com/docs/guides/tools-computer-use) · [Responses API reference](https://platform.openai.com/docs/api-reference/responses)

</Tab>

</Tabs>

## Related recipes

<RecipeGrid>
<RecipeCard slug="gemini-computer-use" title={"Drive a browser with Gemini Computer Use"} description={"Connect Google's Gemini Computer Use to a Steel browser session for autonomous web interactions."} topics={['Computer use']} date="2025-11-25" />
<RecipeCard slug="claude-computer-use-mobile" title={"Drive a mobile browser with Claude Computer Use"} description={"Claude Computer Use with Steel for autonomous task execution in mobile browser environments."} topics={['Computer use', 'Mobile']} date="2025-10-14" />
<RecipeCard slug="claude-computer-use" title={"Drive a browser with Claude Computer Use"} description={"Connect Claude to a Steel browser session for autonomous web interactions."} topics={['Computer use']} date="2025-07-16" />
</RecipeGrid>
