Quickstart (Python)

How to use Claude Computer Use with Steel

This guide shows you how to use Claude models with computer use capabilities and Steel browsers to create AI agents that navigate the web.

We'll build a Claude Computer Use loop that enables autonomous web task execution through iterative screenshot analysis and action planning.

Prerequisites

  • Python 3.11+

  • A Steel API key (sign up here)

  • An Anthropic API key with access to Claude models

Step 1: Setup and Dependencies

First, create a project directory, set up a virtual environment, and install the required packages:

Terminal
# Create a project directory
mkdir steel-claude-computer-use
cd steel-claude-computer-use
# Recommended: Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
# Install required packages
pip install steel-sdk anthropic playwright python-dotenv pillow

Create a .env file with your API keys:

ENV
.env
1
STEEL_API_KEY=your_steel_api_key_here
2
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3
TASK=Go to Wikipedia and search for machine learning

Step 2: Create Helper Functions

Python
utils.py
1
import os
2
import time
3
import base64
4
import json
5
import re
6
from typing import List, Dict
7
from urllib.parse import urlparse
8
9
from dotenv import load_dotenv
10
from PIL import Image
11
from io import BytesIO
12
from playwright.sync_api import sync_playwright, Error as PlaywrightError
13
from steel import Steel
14
from anthropic import Anthropic
15
from anthropic.types.beta import BetaMessageParam
16
17
18
load_dotenv(override=True)
19
20
# Replace with your own API keys
21
STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"
22
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY") or "your-anthropic-api-key-here"
23
24
# Replace with your own task
25
TASK = os.getenv("TASK") or "Go to Wikipedia and search for machine learning"
26
27
SYSTEM_PROMPT = """You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.
28
29
<CAPABILITIES>
30
* You control a Chrome browser tab and can navigate to any website
31
* You can click, type, scroll, take screenshots, and interact with web elements
32
* You have full internet access and can visit any public website
33
* You can read content, fill forms, search for information, and perform complex multi-step tasks
34
* After each action, you receive a screenshot showing the current state
35
36
<COORDINATE_SYSTEM>
37
* The browser viewport has specific dimensions that you must respect
38
* All coordinates (x, y) must be within the viewport bounds
39
* X coordinates must be between 0 and the display width (inclusive)
40
* Y coordinates must be between 0 and the display height (inclusive)
41
* Always ensure your click, move, scroll, and drag coordinates are within these bounds
42
* If you're unsure about element locations, take a screenshot first to see the current state
43
44
<AUTONOMOUS_EXECUTION>
45
* Work completely independently - make decisions and act immediately without asking questions
46
* Never request clarification, present options, or ask for permission
47
* Make intelligent assumptions based on task context
48
* If something is ambiguous, choose the most logical interpretation and proceed
49
* Take immediate action rather than explaining what you might do
50
* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions
51
52
<REASONING_STRUCTURE>
53
For each step, you must reason systematically:
54
* Analyze your previous action's success/failure and current state
55
* Identify what specific progress has been made toward the goal
56
* Determine the next immediate objective and how to achieve it
57
* Choose the most efficient action sequence to make progress
58
59
<EFFICIENCY_PRINCIPLES>
60
* Combine related actions when possible rather than single-step execution
61
* Navigate directly to relevant websites without unnecessary exploration
62
* Use screenshots strategically to understand page state before acting
63
* Be persistent with alternative approaches if initial attempts fail
64
* Focus on the specific information or outcome requested
65
66
<COMPLETION_CRITERIA>
67
* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"
68
* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"
69
* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"
70
* Do not write anything after completing the task except the required completion message
71
* Do not ask questions, provide commentary, or offer additional help after task completion
72
* The completion message is the end of the interaction - nothing else should follow
73
74
<CRITICAL_REQUIREMENTS>
75
* This is fully automated execution - work completely independently
76
* Start by taking a screenshot to understand the current state
77
* Never click on browser UI elements
78
* Always respect coordinate boundaries - invalid coordinates will fail
79
* Recognize when the stated objective has been achieved and declare completion immediately
80
* Focus on the explicit task given, not implied or potential follow-up tasks
81
82
Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results."""
83
84
BLOCKED_DOMAINS = [
85
"maliciousbook.com",
86
"evilvideos.com",
87
"darkwebforum.com",
88
"shadytok.com",
89
"suspiciouspins.com",
90
"ilanbigio.com",
91
]
92
93
MODEL_CONFIGS = {
94
"claude-3-5-sonnet-20241022": {
95
"tool_type": "computer_20241022",
96
"beta_flag": "computer-use-2024-10-22",
97
"description": "Stable Claude 3.5 Sonnet (recommended)"
98
},
99
"claude-3-7-sonnet-20250219": {
100
"tool_type": "computer_20250124",
101
"beta_flag": "computer-use-2025-01-24",
102
"description": "Claude 3.7 Sonnet (newer)"
103
},
104
"claude-sonnet-4-20250514": {
105
"tool_type": "computer_20250124",
106
"beta_flag": "computer-use-2025-01-24",
107
"description": "Claude 4 Sonnet (newest)"
108
},
109
"claude-opus-4-20250514": {
110
"tool_type": "computer_20250124",
111
"beta_flag": "computer-use-2025-01-24",
112
"description": "Claude 4 Opus (newest)"
113
}
114
}
115
116
CUA_KEY_TO_PLAYWRIGHT_KEY = {
117
"/": "Divide",
118
"\\": "Backslash",
119
"alt": "Alt",
120
"arrowdown": "ArrowDown",
121
"arrowleft": "ArrowLeft",
122
"arrowright": "ArrowRight",
123
"arrowup": "ArrowUp",
124
"backspace": "Backspace",
125
"capslock": "CapsLock",
126
"cmd": "Meta",
127
"ctrl": "Control",
128
"delete": "Delete",
129
"end": "End",
130
"enter": "Enter",
131
"esc": "Escape",
132
"home": "Home",
133
"insert": "Insert",
134
"option": "Alt",
135
"pagedown": "PageDown",
136
"pageup": "PageUp",
137
"shift": "Shift",
138
"space": " ",
139
"super": "Meta",
140
"tab": "Tab",
141
"win": "Meta",
142
"Return": "Enter",
143
"KP_Enter": "Enter",
144
"Escape": "Escape",
145
"BackSpace": "Backspace",
146
"Delete": "Delete",
147
"Tab": "Tab",
148
"ISO_Left_Tab": "Shift+Tab",
149
"Up": "ArrowUp",
150
"Down": "ArrowDown",
151
"Left": "ArrowLeft",
152
"Right": "ArrowRight",
153
"Page_Up": "PageUp",
154
"Page_Down": "PageDown",
155
"Home": "Home",
156
"End": "End",
157
"Insert": "Insert",
158
"F1": "F1", "F2": "F2", "F3": "F3", "F4": "F4",
159
"F5": "F5", "F6": "F6", "F7": "F7", "F8": "F8",
160
"F9": "F9", "F10": "F10", "F11": "F11", "F12": "F12",
161
"Shift_L": "Shift", "Shift_R": "Shift",
162
"Control_L": "Control", "Control_R": "Control",
163
"Alt_L": "Alt", "Alt_R": "Alt",
164
"Meta_L": "Meta", "Meta_R": "Meta",
165
"Super_L": "Meta", "Super_R": "Meta",
166
"minus": "-",
167
"equal": "=",
168
"bracketleft": "[",
169
"bracketright": "]",
170
"semicolon": ";",
171
"apostrophe": "'",
172
"grave": "`",
173
"comma": ",",
174
"period": ".",
175
"slash": "/",
176
}
177
178
179
def chunks(s: str, chunk_size: int) -> List[str]:
180
return [s[i : i + chunk_size] for i in range(0, len(s), chunk_size)]
181
182
183
def pp(obj):
184
print(json.dumps(obj, indent=2))
185
186
187
def show_image(base_64_image):
188
image_data = base64.b64decode(base_64_image)
189
image = Image.open(BytesIO(image_data))
190
image.show()
191
192
193
def check_blocklisted_url(url: str) -> None:
194
hostname = urlparse(url).hostname or ""
195
if any(
196
hostname == blocked or hostname.endswith(f".{blocked}")
197
for blocked in BLOCKED_DOMAINS
198
):
199
raise ValueError(f"Blocked URL: {url}")

Step 3: Create Steel Browser Integration

Python
steel_browser.py
1
class SteelBrowser:
2
def __init__(
3
self,
4
width: int = 1024,
5
height: int = 768,
6
proxy: bool = False,
7
solve_captcha: bool = False,
8
virtual_mouse: bool = True,
9
session_timeout: int = 900000,
10
ad_blocker: bool = True,
11
start_url: str = "https://www.google.com",
12
):
13
self.client = Steel(
14
steel_api_key=os.getenv("STEEL_API_KEY"),
15
)
16
self.dimensions = (width, height)
17
self.proxy = proxy
18
self.solve_captcha = solve_captcha
19
self.virtual_mouse = virtual_mouse
20
self.session_timeout = session_timeout
21
self.ad_blocker = ad_blocker
22
self.start_url = start_url
23
self.session = None
24
self._playwright = None
25
self._browser = None
26
self._page = None
27
self._last_mouse_position = None
28
29
def get_dimensions(self):
30
return self.dimensions
31
32
def get_current_url(self) -> str:
33
return self._page.url if self._page else ""
34
35
def __enter__(self):
36
width, height = self.dimensions
37
session_params = {
38
"use_proxy": self.proxy,
39
"solve_captcha": self.solve_captcha,
40
"api_timeout": self.session_timeout,
41
"block_ads": self.ad_blocker,
42
"dimensions": {"width": width, "height": height}
43
}
44
self.session = self.client.sessions.create(**session_params)
45
46
print("Steel Session created successfully!")
47
print(f"View live session at: {self.session.session_viewer_url}")
48
49
self._playwright = sync_playwright().start()
50
browser = self._playwright.chromium.connect_over_cdp(
51
f"{self.session.websocket_url}&apiKey={os.getenv('STEEL_API_KEY')}",
52
timeout=60000
53
)
54
self._browser = browser
55
context = browser.contexts[0]
56
57
def handle_route(route, request):
58
url = request.url
59
try:
60
check_blocklisted_url(url)
61
route.continue_()
62
except ValueError:
63
print(f"Blocking URL: {url}")
64
route.abort()
65
66
if self.virtual_mouse:
67
context.add_init_script("""
68
if (window.self === window.top) {
69
function initCursor() {
70
const CURSOR_ID = '__cursor__';
71
if (document.getElementById(CURSOR_ID)) return;
72
73
const cursor = document.createElement('div');
74
cursor.id = CURSOR_ID;
75
Object.assign(cursor.style, {
76
position: 'fixed',
77
top: '0px',
78
left: '0px',
79
width: '20px',
80
height: '20px',
81
backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',
82
backgroundSize: 'cover',
83
pointerEvents: 'none',
84
zIndex: '99999',
85
transform: 'translate(-2px, -2px)',
86
});
87
88
document.body.appendChild(cursor);
89
90
document.addEventListener("mousemove", (e) => {
91
cursor.style.top = e.clientY + "px";
92
cursor.style.left = e.clientX + "px";
93
});
94
}
95
96
requestAnimationFrame(function checkBody() {
97
if (document.body) {
98
initCursor();
99
} else {
100
requestAnimationFrame(checkBody);
101
}
102
});
103
}
104
""")
105
106
self._page = context.pages[0]
107
self._page.route("**/*", handle_route)
108
109
self._page.set_viewport_size({"width": width, "height": height})
110
111
self._page.goto(self.start_url)
112
113
return self
114
115
def __exit__(self, exc_type, exc_val, exc_tb):
116
if self._page:
117
self._page.close()
118
if self._browser:
119
self._browser.close()
120
if self._playwright:
121
self._playwright.stop()
122
123
if self.session:
124
print("Releasing Steel session...")
125
self.client.sessions.release(self.session.id)
126
print(f"Session completed. View replay at {self.session.session_viewer_url}")
127
128
def screenshot(self) -> str:
129
try:
130
width, height = self.dimensions
131
png_bytes = self._page.screenshot(
132
full_page=False,
133
clip={"x": 0, "y": 0, "width": width, "height": height}
134
)
135
return base64.b64encode(png_bytes).decode("utf-8")
136
except PlaywrightError as error:
137
print(f"Screenshot failed, trying CDP fallback: {error}")
138
try:
139
cdp_session = self._page.context.new_cdp_session(self._page)
140
result = cdp_session.send(
141
"Page.captureScreenshot", {"format": "png", "fromSurface": False}
142
)
143
return result["data"]
144
except PlaywrightError as cdp_error:
145
print(f"CDP screenshot also failed: {cdp_error}")
146
raise error
147
148
def validate_and_get_coordinates(self, coordinate):
149
if not isinstance(coordinate, (list, tuple)) or len(coordinate) != 2:
150
raise ValueError(f"{coordinate} must be a tuple or list of length 2")
151
if not all(isinstance(i, int) and i >= 0 for i in coordinate):
152
raise ValueError(f"{coordinate} must be a tuple/list of non-negative ints")
153
154
x, y = self.clamp_coordinates(coordinate[0], coordinate[1])
155
return x, y
156
157
def clamp_coordinates(self, x: int, y: int):
158
width, height = self.dimensions
159
clamped_x = max(0, min(x, width - 1))
160
clamped_y = max(0, min(y, height - 1))
161
162
if x != clamped_x or y != clamped_y:
163
print(f"โš ๏ธ Coordinate clamped: ({x}, {y}) โ†’ ({clamped_x}, {clamped_y})")
164
165
return clamped_x, clamped_y
166
167
def execute_computer_action(
168
self,
169
action: str,
170
text: str = None,
171
coordinate = None,
172
scroll_direction: str = None,
173
scroll_amount: int = None,
174
duration = None,
175
key: str = None,
176
**kwargs
177
) -> str:
178
179
if action in ("left_mouse_down", "left_mouse_up"):
180
if coordinate is not None:
181
raise ValueError(f"coordinate is not accepted for {action}")
182
183
if action == "left_mouse_down":
184
self._page.mouse.down()
185
elif action == "left_mouse_up":
186
self._page.mouse.up()
187
188
return self.screenshot()
189
190
if action == "scroll":
191
if scroll_direction is None or scroll_direction not in ("up", "down", "left", "right"):
192
raise ValueError("scroll_direction must be 'up', 'down', 'left', or 'right'")
193
if scroll_amount is None or not isinstance(scroll_amount, int) or scroll_amount < 0:
194
raise ValueError("scroll_amount must be a non-negative int")
195
196
if coordinate is not None:
197
x, y = self.validate_and_get_coordinates(coordinate)
198
self._page.mouse.move(x, y)
199
self._last_mouse_position = (x, y)
200
201
if text:
202
modifier_key = text
203
if modifier_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
204
modifier_key = CUA_KEY_TO_PLAYWRIGHT_KEY[modifier_key]
205
self._page.keyboard.down(modifier_key)
206
207
scroll_mapping = {
208
"down": (0, 100 * scroll_amount),
209
"up": (0, -100 * scroll_amount),
210
"right": (100 * scroll_amount, 0),
211
"left": (-100 * scroll_amount, 0)
212
}
213
delta_x, delta_y = scroll_mapping[scroll_direction]
214
self._page.mouse.wheel(delta_x, delta_y)
215
216
if text:
217
self._page.keyboard.up(modifier_key)
218
219
return self.screenshot()
220
221
if action in ("hold_key", "wait"):
222
if duration is None or not isinstance(duration, (int, float)):
223
raise ValueError("duration must be a number")
224
if duration < 0:
225
raise ValueError("duration must be non-negative")
226
if duration > 100:
227
raise ValueError("duration is too long")
228
229
if action == "hold_key":
230
if text is None:
231
raise ValueError("text is required for hold_key")
232
233
hold_key = text
234
if hold_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
235
hold_key = CUA_KEY_TO_PLAYWRIGHT_KEY[hold_key]
236
237
self._page.keyboard.down(hold_key)
238
time.sleep(duration)
239
self._page.keyboard.up(hold_key)
240
241
elif action == "wait":
242
time.sleep(duration)
243
244
return self.screenshot()
245
246
if action in ("left_click", "right_click", "double_click", "triple_click", "middle_click"):
247
if text is not None:
248
raise ValueError(f"text is not accepted for {action}")
249
250
if coordinate is not None:
251
x, y = self.validate_and_get_coordinates(coordinate)
252
self._page.mouse.move(x, y)
253
self._last_mouse_position = (x, y)
254
click_x, click_y = x, y
255
elif self._last_mouse_position:
256
click_x, click_y = self._last_mouse_position
257
else:
258
width, height = self.dimensions
259
click_x, click_y = width // 2, height // 2
260
261
if key:
262
modifier_key = key
263
if modifier_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
264
modifier_key = CUA_KEY_TO_PLAYWRIGHT_KEY[modifier_key]
265
self._page.keyboard.down(modifier_key)
266
267
if action == "left_click":
268
self._page.mouse.click(click_x, click_y)
269
elif action == "right_click":
270
self._page.mouse.click(click_x, click_y, button="right")
271
elif action == "double_click":
272
self._page.mouse.dblclick(click_x, click_y)
273
elif action == "triple_click":
274
for _ in range(3):
275
self._page.mouse.click(click_x, click_y)
276
elif action == "middle_click":
277
self._page.mouse.click(click_x, click_y, button="middle")
278
279
if key:
280
self._page.keyboard.up(modifier_key)
281
282
return self.screenshot()
283
284
if action in ("mouse_move", "left_click_drag"):
285
if coordinate is None:
286
raise ValueError(f"coordinate is required for {action}")
287
if text is not None:
288
raise ValueError(f"text is not accepted for {action}")
289
290
x, y = self.validate_and_get_coordinates(coordinate)
291
292
if action == "mouse_move":
293
self._page.mouse.move(x, y)
294
self._last_mouse_position = (x, y)
295
elif action == "left_click_drag":
296
self._page.mouse.down()
297
self._page.mouse.move(x, y)
298
self._page.mouse.up()
299
self._last_mouse_position = (x, y)
300
301
return self.screenshot()
302
303
if action in ("key", "type"):
304
if text is None:
305
raise ValueError(f"text is required for {action}")
306
if coordinate is not None:
307
raise ValueError(f"coordinate is not accepted for {action}")
308
309
if action == "key":
310
press_key = text
311
312
if "+" in press_key:
313
key_parts = press_key.split("+")
314
modifier_keys = key_parts[:-1]
315
main_key = key_parts[-1]
316
317
playwright_modifiers = []
318
for mod in modifier_keys:
319
if mod.lower() in ("ctrl", "control"):
320
playwright_modifiers.append("Control")
321
elif mod.lower() in ("shift",):
322
playwright_modifiers.append("Shift")
323
elif mod.lower() in ("alt", "option"):
324
playwright_modifiers.append("Alt")
325
elif mod.lower() in ("cmd", "meta", "super"):
326
playwright_modifiers.append("Meta")
327
else:
328
playwright_modifiers.append(mod)
329
330
if main_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
331
main_key = CUA_KEY_TO_PLAYWRIGHT_KEY[main_key]
332
333
press_key = "+".join(playwright_modifiers + [main_key])
334
else:
335
if press_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
336
press_key = CUA_KEY_TO_PLAYWRIGHT_KEY[press_key]
337
338
self._page.keyboard.press(press_key)
339
elif action == "type":
340
for chunk in chunks(text, 50):
341
self._page.keyboard.type(chunk, delay=12)
342
time.sleep(0.01)
343
344
return self.screenshot()
345
346
if action in ("screenshot", "cursor_position"):
347
if text is not None:
348
raise ValueError(f"text is not accepted for {action}")
349
if coordinate is not None:
350
raise ValueError(f"coordinate is not accepted for {action}")
351
352
return self.screenshot()
353
354
raise ValueError(f"Invalid action: {action}")

Step 4: Create the Agent Class

Python
claude_agent.py
1
class ClaudeAgent:
2
def __init__(self, computer = None, model: str = "claude-3-5-sonnet-20241022"):
3
self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
4
self.computer = computer
5
self.messages: List[BetaMessageParam] = []
6
self.model = model
7
8
if computer:
9
width, height = computer.get_dimensions()
10
self.viewport_width = width
11
self.viewport_height = height
12
13
self.system_prompt = SYSTEM_PROMPT.replace(
14
'<COORDINATE_SYSTEM>',
15
f'<COORDINATE_SYSTEM>\n* The browser viewport dimensions are {width}x{height} pixels\n* The browser viewport has specific dimensions that you must respect'
16
)
17
18
if model not in MODEL_CONFIGS:
19
raise ValueError(f"Unsupported model: {model}. Available models: {list(MODEL_CONFIGS.keys())}")
20
21
self.model_config = MODEL_CONFIGS[model]
22
23
self.tools = [{
24
"type": self.model_config["tool_type"],
25
"name": "computer",
26
"display_width_px": width,
27
"display_height_px": height,
28
"display_number": 1,
29
}]
30
else:
31
self.viewport_width = 1024
32
self.viewport_height = 768
33
self.system_prompt = SYSTEM_PROMPT
34
35
def get_viewport_info(self) -> dict:
36
if not self.computer or not self.computer._page:
37
return {}
38
39
try:
40
return self.computer._page.evaluate("""
41
() => ({
42
innerWidth: window.innerWidth,
43
innerHeight: window.innerHeight,
44
devicePixelRatio: window.devicePixelRatio,
45
screenWidth: window.screen.width,
46
screenHeight: window.screen.height,
47
scrollX: window.scrollX,
48
scrollY: window.scrollY
49
})
50
""")
51
except:
52
return {}
53
54
def validate_screenshot_dimensions(self, screenshot_base64: str) -> dict:
55
try:
56
image_data = base64.b64decode(screenshot_base64)
57
image = Image.open(BytesIO(image_data))
58
screenshot_width, screenshot_height = image.size
59
60
viewport_info = self.get_viewport_info()
61
62
scaling_info = {
63
"screenshot_size": (screenshot_width, screenshot_height),
64
"viewport_size": (self.viewport_width, self.viewport_height),
65
"actual_viewport": (viewport_info.get('innerWidth', 0), viewport_info.get('innerHeight', 0)),
66
"device_pixel_ratio": viewport_info.get('devicePixelRatio', 1.0),
67
"width_scale": screenshot_width / self.viewport_width if self.viewport_width > 0 else 1.0,
68
"height_scale": screenshot_height / self.viewport_height if self.viewport_height > 0 else 1.0
69
}
70
71
if scaling_info["width_scale"] != 1.0 or scaling_info["height_scale"] != 1.0:
72
print(f"โš ๏ธ Screenshot scaling detected:")
73
print(f" Screenshot: {screenshot_width}x{screenshot_height}")
74
print(f" Expected viewport: {self.viewport_width}x{self.viewport_height}")
75
print(f" Actual viewport: {viewport_info.get('innerWidth', 'unknown')}x{viewport_info.get('innerHeight', 'unknown')}")
76
print(f" Scale factors: {scaling_info['width_scale']:.3f}x{scaling_info['height_scale']:.3f}")
77
78
return scaling_info
79
except Exception as e:
80
print(f"โš ๏ธ Error validating screenshot dimensions: {e}")
81
return {}
82
83
def execute_task(
84
self,
85
task: str,
86
print_steps: bool = True,
87
debug: bool = False,
88
max_iterations: int = 50
89
) -> str:
90
91
input_items = [
92
{
93
"role": "user",
94
"content": task,
95
},
96
]
97
98
new_items = []
99
iterations = 0
100
consecutive_no_actions = 0
101
last_assistant_messages = []
102
103
print(f"๐ŸŽฏ Executing task: {task}")
104
print("=" * 60)
105
106
def is_task_complete(content: str) -> dict:
107
if "TASK_COMPLETED:" in content:
108
return {"completed": True, "reason": "explicit_completion"}
109
if "TASK_FAILED:" in content or "TASK_ABANDONED:" in content:
110
return {"completed": True, "reason": "explicit_failure"}
111
112
completion_patterns = [
113
r'task\s+(completed|finished|done|accomplished)',
114
r'successfully\s+(completed|finished|found|gathered)',
115
r'here\s+(is|are)\s+the\s+(results?|information|summary)',
116
r'to\s+summarize',
117
r'in\s+conclusion',
118
r'final\s+(answer|result|summary)'
119
]
120
121
failure_patterns = [
122
r'cannot\s+(complete|proceed|access|continue)',
123
r'unable\s+to\s+(complete|access|find|proceed)',
124
r'blocked\s+by\s+(captcha|security|authentication)',
125
r'giving\s+up',
126
r'no\s+longer\s+able',
127
r'have\s+tried\s+multiple\s+approaches'
128
]
129
130
for pattern in completion_patterns:
131
if re.search(pattern, content, re.IGNORECASE):
132
return {"completed": True, "reason": "natural_completion"}
133
134
for pattern in failure_patterns:
135
if re.search(pattern, content, re.IGNORECASE):
136
return {"completed": True, "reason": "natural_failure"}
137
138
return {"completed": False}
139
140
def detect_repetition(new_message: str) -> bool:
141
if len(last_assistant_messages) < 2:
142
return False
143
144
def similarity(str1: str, str2: str) -> float:
145
words1 = str1.lower().split()
146
words2 = str2.lower().split()
147
common_words = [word for word in words1 if word in words2]
148
return len(common_words) / max(len(words1), len(words2))
149
150
return any(similarity(new_message, prev_message) > 0.8
151
for prev_message in last_assistant_messages)
152
153
while iterations < max_iterations:
154
iterations += 1
155
has_actions = False
156
157
if new_items and new_items[-1].get("role") == "assistant":
158
last_message = new_items[-1]
159
if last_message.get("content") and len(last_message["content"]) > 0:
160
content = last_message["content"][0].get("text", "")
161
162
completion = is_task_complete(content)
163
if completion["completed"]:
164
print(f"โœ… Task completed ({completion['reason']})")
165
break
166
167
if detect_repetition(content):
168
print("๐Ÿ”„ Repetition detected - stopping execution")
169
last_assistant_messages.append(content)
170
break
171
172
last_assistant_messages.append(content)
173
if len(last_assistant_messages) > 3:
174
last_assistant_messages.pop(0)
175
176
if debug:
177
pp(input_items + new_items)
178
179
try:
180
response = self.client.beta.messages.create(
181
model=self.model,
182
max_tokens=4096,
183
system=self.system_prompt,
184
messages=input_items + new_items,
185
tools=self.tools,
186
betas=[self.model_config["beta_flag"]]
187
)
188
189
if debug:
190
pp(response)
191
192
for block in response.content:
193
if block.type == "text":
194
print(block.text)
195
new_items.append({
196
"role": "assistant",
197
"content": [
198
{
199
"type": "text",
200
"text": block.text
201
}
202
]
203
})
204
elif block.type == "tool_use":
205
has_actions = True
206
if block.name == "computer":
207
tool_input = block.input
208
action = tool_input.get("action")
209
210
print(f"๐Ÿ”ง {action}({tool_input})")
211
212
screenshot_base64 = self.computer.execute_computer_action(
213
action=action,
214
text=tool_input.get("text"),
215
coordinate=tool_input.get("coordinate"),
216
scroll_direction=tool_input.get("scroll_direction"),
217
scroll_amount=tool_input.get("scroll_amount"),
218
duration=tool_input.get("duration"),
219
key=tool_input.get("key")
220
)
221
222
if action == "screenshot":
223
self.validate_screenshot_dimensions(screenshot_base64)
224
225
new_items.append({
226
"role": "assistant",
227
"content": [
228
{
229
"type": "tool_use",
230
"id": block.id,
231
"name": block.name,
232
"input": tool_input
233
}
234
]
235
})
236
237
current_url = self.computer.get_current_url()
238
check_blocklisted_url(current_url)
239
240
new_items.append({
241
"role": "user",
242
"content": [
243
{
244
"type": "tool_result",
245
"tool_use_id": block.id,
246
"content": [
247
{
248
"type": "image",
249
"source": {
250
"type": "base64",
251
"media_type": "image/png",
252
"data": screenshot_base64
253
}
254
}
255
]
256
}
257
]
258
})
259
260
if not has_actions:
261
consecutive_no_actions += 1
262
if consecutive_no_actions >= 3:
263
print("โš ๏ธ No actions for 3 consecutive iterations - stopping")
264
break
265
else:
266
consecutive_no_actions = 0
267
268
except Exception as error:
269
print(f"โŒ Error during task execution: {error}")
270
raise error
271
272
if iterations >= max_iterations:
273
print(f"โš ๏ธ Task execution stopped after {max_iterations} iterations")
274
275
assistant_messages = [item for item in new_items if item.get("role") == "assistant"]
276
if assistant_messages:
277
final_message = assistant_messages[-1]
278
content = final_message.get("content")
279
if isinstance(content, list) and len(content) > 0:
280
for block in content:
281
if isinstance(block, dict) and block.get("type") == "text":
282
return block.get("text", "Task execution completed (no final message)")
283
284
return "Task execution completed (no final message)"

Step 5: Create the Main Script

Python
main.py
1
def main():
2
print("๐Ÿš€ Steel + Claude Computer Use Assistant")
3
print("=" * 60)
4
5
if STEEL_API_KEY == "your-steel-api-key-here":
6
print("โš ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key")
7
print(" Get your API key at: https://app.steel.dev/settings/api-keys")
8
return
9
10
if ANTHROPIC_API_KEY == "your-anthropic-api-key-here":
11
print("โš ๏ธ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key")
12
print(" Get your API key at: https://console.anthropic.com/")
13
return
14
15
print("\nStarting Steel browser session...")
16
17
try:
18
with SteelBrowser() as computer:
19
print("โœ… Steel browser session started!")
20
21
agent = ClaudeAgent(
22
computer=computer,
23
model="claude-3-5-sonnet-20241022",
24
)
25
26
start_time = time.time()
27
28
try:
29
result = agent.execute_task(
30
TASK,
31
print_steps=True,
32
debug=False,
33
max_iterations=50,
34
)
35
36
duration = f"{(time.time() - start_time):.1f}"
37
38
print("\n" + "=" * 60)
39
print("๐ŸŽ‰ TASK EXECUTION COMPLETED")
40
print("=" * 60)
41
print(f"โฑ๏ธ Duration: {duration} seconds")
42
print(f"๐ŸŽฏ Task: {TASK}")
43
print(f"๐Ÿ“‹ Result:\n{result}")
44
print("=" * 60)
45
46
except Exception as error:
47
print(f"โŒ Task execution failed: {error}")
48
exit(1)
49
50
except Exception as e:
51
print(f"โŒ Failed to start Steel browser: {e}")
52
print("Please check your STEEL_API_KEY and internet connection.")
53
exit(1)
54
55
56
if __name__ == "__main__":
57
main()

Running Your Agent

Execute your script:

You'll see the session URL printed in the console. Open this URL to view the live browser session. The agent will execute the task defined in the TASK environment variable or the default task.

You can modify the task by setting the environment variable:

Terminal
export TASK="Search for the latest developments in artificial intelligence"
python main.py

Customizing your agent's task

Try modifying the task to make your agent perform different actions:

ENV
.env
1
# Research specific topics
2
TASK = "Go to https://arxiv.org, search for 'computer vision', and summarize the latest papers."
3
4
# E-commerce tasks
5
TASK = "Go to https://www.amazon.com, search for 'mechanical keyboards', and compare the top 3 results."
6
7
# Information gathering
8
TASK = "Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary."

Supported Models: This example uses Claude 3.5 Sonnet, but you can use any of the supported Claude models including Claude 3.7 Sonnet, Claude 4 Sonnet, or Claude 4 Opus. Update the model parameter in the ClaudeAgent constructor to switch models.

Next Steps

  • Explore the Steel API documentation for more advanced features

  • Check out the Anthropic documentation for more information about Claude's computer use capabilities

  • Add additional features like session recording or multi-session management