Quickstart (Python)

How to use Claude Computer Use with Steel

This guide shows you how to use Claude models with computer use capabilities and Steel's Computer API to create AI agents that navigate the web.

We'll build a Claude Computer Use loop that enables autonomous web task execution through iterative screenshot analysis and action planning.

Prerequisites

  • Python 3.11+

  • A Steel API key (sign up here)

  • An Anthropic API key with access to Claude models

Step 1: Setup and Helper Functions

First, set up a virtual environment and install the required packages:

Terminal
$
uv venv
$
source .venv/bin/activate
$
uv add steel-sdk anthropic python-dotenv

Create a .env file with your API keys:

ENV
.env
1
STEEL_API_KEY=your_steel_api_key_here
2
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3
TASK=Go to Steel.dev and find the latest news

Create a file with helper functions and constants:

Python
helpers.py
1
import os
2
import json
3
from typing import List, Optional, Tuple
4
from datetime import datetime
5
6
from dotenv import load_dotenv
7
from steel import Steel
8
from anthropic import Anthropic
9
from anthropic.types.beta import BetaMessageParam
10
11
load_dotenv(override=True)
12
13
STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"
14
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY") or "your-anthropic-api-key-here"
15
TASK = os.getenv("TASK") or "Go to Steel.dev and find the latest news"
16
17
18
def format_today() -> str:
19
return datetime.now().strftime("%A, %B %d, %Y")
20
21
22
BROWSER_SYSTEM_PROMPT = f"""<BROWSER_ENV>
23
- You control a headful Chromium browser running in a VM with internet access.
24
- Chromium is already open; interact only through the "computer" tool (mouse, keyboard, scroll, screenshots).
25
- Today's date is {format_today()}.
26
</BROWSER_ENV>
27
28
<BROWSER_CONTROL>
29
- When viewing pages, zoom out or scroll so all relevant content is visible.
30
- When typing into any input:
31
* Clear it first with Ctrl+A, then Delete.
32
* After submitting (pressing Enter or clicking a button), take an extra screenshot to confirm the result and move the mouse away.
33
- Computer tool calls are slow; batch related actions into a single call whenever possible.
34
- You may act on the user's behalf on sites where they are already authenticated.
35
- Assume any required authentication/Auth Contexts are already configured before the task starts.
36
- If the first screenshot is black:
37
* Click near the center of the screen.
38
* Take another screenshot.
39
- Never click the browser address bar with the mouse. To navigate to a URL:
40
* Press Ctrl+L to focus and select the address bar.
41
* Type the full URL, then press Enter.
42
* If you see any existing text (e.g., 'about:blank'), press Ctrl+L before typing so you replace it (never append).
43
- Prefer typing into inputs on the page (e.g., a site's search box) rather than the browser address bar, unless entering a direct URL.
44
</BROWSER_CONTROL>
45
46
<TASK_EXECUTION>
47
- You receive exactly one natural-language task and no further user feedback.
48
- Do not ask the user clarifying questions; instead, make reasonable assumptions and proceed.
49
- For complex tasks, quickly plan a short, ordered sequence of steps before acting.
50
- Prefer minimal, high-signal actions that move directly toward the goal.
51
- Keep your final response concise and focused on fulfilling the task (e.g., a brief summary of findings or results).
52
</TASK_EXECUTION>"""
53
54
55
def pp(obj) -> None:
56
print(json.dumps(obj, indent=2))

Step 2: Create the Agent Class

Python
agent.py
1
import time
2
import json
3
from typing import List, Optional, Tuple
4
5
from helpers import (
6
STEEL_API_KEY,
7
ANTHROPIC_API_KEY,
8
BROWSER_SYSTEM_PROMPT,
9
pp,
10
)
11
from steel import Steel
12
from anthropic import Anthropic
13
from anthropic.types.beta import BetaMessageParam
14
15
16
class Agent:
17
def __init__(self):
18
self.client = Anthropic(api_key=ANTHROPIC_API_KEY)
19
self.steel = Steel(steel_api_key=STEEL_API_KEY)
20
self.model = "claude-sonnet-4-5"
21
self.messages: List[BetaMessageParam] = []
22
self.session = None
23
self.viewport_width = 1280
24
self.viewport_height = 768
25
self.system_prompt = BROWSER_SYSTEM_PROMPT
26
self.tools = [
27
{
28
"type": "computer_20250124",
29
"name": "computer",
30
"display_width_px": self.viewport_width,
31
"display_height_px": self.viewport_height,
32
"display_number": 1,
33
}
34
]
35
36
def _center(self) -> Tuple[int, int]:
37
return (self.viewport_width // 2, self.viewport_height // 2)
38
39
def _split_keys(self, k: Optional[str]) -> List[str]:
40
return [s.strip() for s in k.split("+")] if k else []
41
42
def _normalize_key(self, key: str) -> str:
43
if not isinstance(key, str) or not key:
44
return key
45
k = key.strip()
46
upper = k.upper()
47
synonyms = {
48
"ENTER": "Enter",
49
"RETURN": "Enter",
50
"ESC": "Escape",
51
"ESCAPE": "Escape",
52
"TAB": "Tab",
53
"BACKSPACE": "Backspace",
54
"DELETE": "Delete",
55
"SPACE": "Space",
56
"CTRL": "Control",
57
"CONTROL": "Control",
58
"ALT": "Alt",
59
"SHIFT": "Shift",
60
"META": "Meta",
61
"CMD": "Meta",
62
"UP": "ArrowUp",
63
"DOWN": "ArrowDown",
64
"LEFT": "ArrowLeft",
65
"RIGHT": "ArrowRight",
66
"HOME": "Home",
67
"END": "End",
68
"PAGEUP": "PageUp",
69
"PAGEDOWN": "PageDown",
70
}
71
if upper in synonyms:
72
return synonyms[upper]
73
if upper.startswith("F") and upper[1:].isdigit():
74
return "F" + upper[1:]
75
return k
76
77
def _normalize_keys(self, keys: List[str]) -> List[str]:
78
return [self._normalize_key(k) for k in keys]
79
80
def initialize(self) -> None:
81
width = self.viewport_width
82
height = self.viewport_height
83
self.session = self.steel.sessions.create(
84
dimensions={"width": width, "height": height},
85
block_ads=True,
86
api_timeout=900000,
87
)
88
print("Steel Session created successfully!")
89
print(f"View live session at: {self.session.session_viewer_url}")
90
91
def cleanup(self) -> None:
92
if self.session:
93
print("Releasing Steel session...")
94
self.steel.sessions.release(self.session.id)
95
print(
96
f"Session completed. View replay at {self.session.session_viewer_url}"
97
)
98
99
def take_screenshot(self) -> str:
100
resp = self.steel.sessions.computer(self.session.id, action="take_screenshot")
101
img = getattr(resp, "base64_image", None)
102
if not img:
103
raise RuntimeError("No screenshot returned from Input API")
104
return img
105
106
def execute_computer_action(
107
self,
108
action: str,
109
text: Optional[str] = None,
110
coordinate: Optional[Tuple[int, int]] = None,
111
scroll_direction: Optional[str] = None,
112
scroll_amount: Optional[int] = None,
113
duration: Optional[float] = None,
114
key: Optional[str] = None,
115
) -> str:
116
if (
117
coordinate
118
and isinstance(coordinate, (list, tuple))
119
and len(coordinate) == 2
120
):
121
coords = (int(coordinate[0]), int(coordinate[1]))
122
else:
123
coords = self._center()
124
125
body: Optional[dict] = None
126
127
if action == "mouse_move":
128
body = {
129
"action": "move_mouse",
130
"coordinates": [coords[0], coords[1]],
131
"screenshot": True,
132
}
133
hk = self._split_keys(key)
134
if hk:
135
body["hold_keys"] = hk
136
137
elif action in ("left_mouse_down", "left_mouse_up"):
138
body = {
139
"action": "click_mouse",
140
"button": "left",
141
"click_type": "down" if action == "left_mouse_down" else "up",
142
"coordinates": [coords[0], coords[1]],
143
"screenshot": True,
144
}
145
hk = self._split_keys(key)
146
if hk:
147
body["hold_keys"] = hk
148
149
elif action in (
150
"left_click",
151
"right_click",
152
"middle_click",
153
"double_click",
154
"triple_click",
155
):
156
button_map = {
157
"left_click": "left",
158
"right_click": "right",
159
"middle_click": "middle",
160
"double_click": "left",
161
"triple_click": "left",
162
}
163
clicks = (
164
2 if action == "double_click" else 3 if action == "triple_click" else 1
165
)
166
body = {
167
"action": "click_mouse",
168
"button": button_map[action],
169
"coordinates": [coords[0], coords[1]],
170
"screenshot": True,
171
}
172
if clicks > 1:
173
body["num_clicks"] = clicks
174
hk = self._split_keys(key)
175
if hk:
176
body["hold_keys"] = hk
177
178
elif action == "left_click_drag":
179
start_x, start_y = self._center()
180
end_x, end_y = coords
181
body = {
182
"action": "drag_mouse",
183
"path": [[start_x, start_y], [end_x, end_y]],
184
"screenshot": True,
185
}
186
hk = self._split_keys(key)
187
if hk:
188
body["hold_keys"] = hk
189
190
elif action == "scroll":
191
step = 100
192
dx_dy = {
193
"down": (0, step * (scroll_amount or 0)),
194
"up": (0, -step * (scroll_amount or 0)),
195
"right": (step * (scroll_amount or 0), 0),
196
"left": (-(step * (scroll_amount or 0)), 0),
197
}
198
dx, dy = dx_dy.get(
199
scroll_direction or "down", (0, step * (scroll_amount or 0))
200
)
201
body = {
202
"action": "scroll",
203
"coordinates": [coords[0], coords[1]],
204
"delta_x": dx,
205
"delta_y": dy,
206
"screenshot": True,
207
}
208
hk = self._split_keys(text)
209
if hk:
210
body["hold_keys"] = hk
211
212
elif action == "hold_key":
213
keys = self._split_keys(text or "")
214
keys = self._normalize_keys(keys)
215
body = {
216
"action": "press_key",
217
"keys": keys or [],
218
"duration": duration,
219
"screenshot": True,
220
}
221
222
elif action == "key":
223
keys = self._split_keys(text or "")
224
keys = self._normalize_keys(keys)
225
body = {
226
"action": "press_key",
227
"keys": keys or [],
228
"screenshot": True,
229
}
230
231
elif action == "type":
232
body = {
233
"action": "type_text",
234
"text": text,
235
"screenshot": True,
236
}
237
hk = self._split_keys(key)
238
if hk:
239
body["hold_keys"] = hk
240
241
elif action == "wait":
242
body = {
243
"action": "wait",
244
"duration": duration,
245
"screenshot": True,
246
}
247
248
elif action == "screenshot":
249
return self.take_screenshot()
250
251
elif action == "cursor_position":
252
self.steel.sessions.computer(self.session.id, action="get_cursor_position")
253
return self.take_screenshot()
254
255
else:
256
raise ValueError(f"Invalid action: {action}")
257
258
clean_body = {k: v for k, v in body.items() if v is not None}
259
resp = self.steel.sessions.computer(self.session.id, **clean_body)
260
img = getattr(resp, "base64_image", None)
261
if img:
262
return img
263
return self.take_screenshot()
264
265
def process_response(self, message) -> str:
266
response_text = ""
267
268
for block in message.content:
269
if block.type == "text":
270
response_text += block.text
271
print(block.text)
272
elif block.type == "tool_use":
273
tool_name = block.name
274
tool_input = block.input
275
print(f"๐Ÿ”ง {tool_name}({json.dumps(tool_input)})")
276
277
if tool_name == "computer":
278
action = tool_input.get("action")
279
params = {
280
"text": tool_input.get("text"),
281
"coordinate": tool_input.get("coordinate"),
282
"scroll_direction": tool_input.get("scroll_direction"),
283
"scroll_amount": tool_input.get("scroll_amount"),
284
"duration": tool_input.get("duration"),
285
"key": tool_input.get("key"),
286
}
287
288
try:
289
screenshot_base64 = self.execute_computer_action(
290
action=action,
291
text=params["text"],
292
coordinate=params["coordinate"],
293
scroll_direction=params["scroll_direction"],
294
scroll_amount=params["scroll_amount"],
295
duration=params["duration"],
296
key=params["key"],
297
)
298
299
self.messages.append(
300
{
301
"role": "assistant",
302
"content": [
303
{
304
"type": "tool_use",
305
"id": block.id,
306
"name": block.name,
307
"input": tool_input,
308
}
309
],
310
}
311
)
312
self.messages.append(
313
{
314
"role": "user",
315
"content": [
316
{
317
"type": "tool_result",
318
"tool_use_id": block.id,
319
"content": [
320
{
321
"type": "image",
322
"source": {
323
"type": "base64",
324
"media_type": "image/png",
325
"data": screenshot_base64,
326
},
327
}
328
],
329
}
330
],
331
}
332
)
333
return self.get_claude_response()
334
335
except Exception as e:
336
print(f"โŒ Error executing {action}: {e}")
337
self.messages.append(
338
{
339
"role": "assistant",
340
"content": [
341
{
342
"type": "tool_use",
343
"id": block.id,
344
"name": block.name,
345
"input": tool_input,
346
}
347
],
348
}
349
)
350
self.messages.append(
351
{
352
"role": "user",
353
"content": [
354
{
355
"type": "tool_result",
356
"tool_use_id": block.id,
357
"content": f"Error executing {action}: {e}",
358
"is_error": True,
359
}
360
],
361
}
362
)
363
return self.get_claude_response()
364
365
if response_text and not any(b.type == "tool_use" for b in message.content):
366
self.messages.append({"role": "assistant", "content": response_text})
367
368
return response_text
369
370
def get_claude_response(self) -> str:
371
try:
372
response = self.client.beta.messages.create(
373
model=self.model,
374
max_tokens=4096,
375
messages=self.messages,
376
tools=self.tools,
377
betas=["computer-use-2025-01-24"],
378
)
379
return self.process_response(response)
380
except Exception as e:
381
err = f"Error communicating with Claude: {e}"
382
print(f"โŒ {err}")
383
return err
384
385
def execute_task(
386
self,
387
task: str,
388
print_steps: bool = True,
389
debug: bool = False,
390
max_iterations: int = 50,
391
) -> str:
392
self.messages = [
393
{"role": "user", "content": self.system_prompt},
394
{"role": "user", "content": task},
395
]
396
397
iterations = 0
398
consecutive_no_actions = 0
399
last_assistant_messages: List[str] = []
400
401
print(f"๐ŸŽฏ Executing task: {task}")
402
print("=" * 60)
403
404
def detect_repetition(new_message: str) -> bool:
405
if len(last_assistant_messages) < 2:
406
return False
407
words1 = new_message.lower().split()
408
return any(
409
len([w for w in words1 if w in prev.lower().split()])
410
/ max(len(words1), len(prev.lower().split()))
411
> 0.8
412
for prev in last_assistant_messages
413
)
414
415
while iterations < max_iterations:
416
iterations += 1
417
has_actions = False
418
419
last_assistant = None
420
for msg in reversed(self.messages):
421
if msg.get("role") == "assistant" and isinstance(
422
msg.get("content"), str
423
):
424
last_assistant = msg.get("content")
425
break
426
427
if isinstance(last_assistant, str):
428
if detect_repetition(last_assistant):
429
print("๐Ÿ”„ Repetition detected - stopping execution")
430
last_assistant_messages.append(last_assistant)
431
break
432
last_assistant_messages.append(last_assistant)
433
if len(last_assistant_messages) > 3:
434
last_assistant_messages.pop(0)
435
436
if debug:
437
pp(self.messages)
438
439
try:
440
response = self.client.beta.messages.create(
441
model=self.model,
442
max_tokens=4096,
443
messages=self.messages,
444
tools=self.tools,
445
betas=["computer-use-2025-01-24"],
446
)
447
448
if debug:
449
pp(response)
450
451
for block in response.content:
452
if block.type == "tool_use":
453
has_actions = True
454
455
self.process_response(response)
456
457
if not has_actions:
458
consecutive_no_actions += 1
459
if consecutive_no_actions >= 3:
460
print("โš ๏ธ No actions for 3 consecutive iterations - stopping")
461
break
462
else:
463
consecutive_no_actions = 0
464
465
except Exception as e:
466
print(f"โŒ Error during task execution: {e}")
467
raise e
468
469
if iterations >= max_iterations:
470
print(f"โš ๏ธ Task execution stopped after {max_iterations} iterations")
471
472
assistant_messages = [m for m in self.messages if m.get("role") == "assistant"]
473
final_message = assistant_messages[-1] if assistant_messages else None
474
if final_message and isinstance(final_message.get("content"), str):
475
return final_message["content"]
476
477
return "Task execution completed (no final message)"

Step 3: Create the Main Script

Python
main.py
1
import sys
2
import time
3
4
from helpers import STEEL_API_KEY, ANTHROPIC_API_KEY, TASK
5
from agent import Agent
6
7
8
def main():
9
print("๐Ÿš€ Steel + Claude Computer Use Assistant")
10
print("=" * 60)
11
12
if STEEL_API_KEY == "your-steel-api-key-here":
13
print(
14
"โš ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"
15
)
16
print(" Get your API key at: https://app.steel.dev/settings/api-keys")
17
sys.exit(1)
18
19
if ANTHROPIC_API_KEY == "your-anthropic-api-key-here":
20
print(
21
"โš ๏ธ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key"
22
)
23
print(" Get your API key at: https://console.anthropic.com/")
24
sys.exit(1)
25
26
print("\nStarting Steel session...")
27
agent = Agent()
28
29
try:
30
agent.initialize()
31
print("โœ… Steel session started!")
32
33
start_time = time.time()
34
35
try:
36
result = agent.execute_task(TASK, True, False, 50)
37
duration = f"{(time.time() - start_time):.1f}"
38
39
print("\n" + "=" * 60)
40
print("๐ŸŽ‰ TASK EXECUTION COMPLETED")
41
print("=" * 60)
42
print(f"โฑ๏ธ Duration: {duration} seconds")
43
print(f"๐ŸŽฏ Task: {TASK}")
44
print(f"๐Ÿ“‹ Result:\n{result}")
45
print("=" * 60)
46
47
except Exception as e:
48
print(f"โŒ Task execution failed: {e}")
49
raise RuntimeError("Task execution failed")
50
51
except Exception as e:
52
print(f"โŒ Failed to start Steel session: {e}")
53
print("Please check your STEEL_API_KEY and internet connection.")
54
raise RuntimeError("Failed to start Steel session")
55
56
finally:
57
agent.cleanup()
58
59
60
if __name__ == "__main__":
61
main()

Running Your Agent

Execute your script:

Terminal
python main.py

You'll see the session URL printed in the console. Open this URL to view the live browser session.

The agent will execute the task defined in the TASK environment variable or the default task.

You can modify the task by setting the environment variable:

Terminal
export TASK="Search for the latest developments in artificial intelligence"
python main.py

Customizing your agent's task

Try modifying the task to make your agent perform different actions:

ENV
.env
1
# Research specific topics
2
TASK=Go to https://arxiv.org, search for 'computer vision', and summarize the latest papers.
3
4
# E-commerce tasks
5
TASK=Go to https://www.amazon.com, search for 'mechanical keyboards', and compare the top 3 results.
6
7
# Information gathering
8
TASK=Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary.

Next Steps

  • Explore the Steel API documentation for more advanced features

  • Check out the Anthropic documentation for more information about Claude's computer use capabilities

  • Add additional features like session recording or multi-session management