Quickstart (Python)

This guide shows you how to use Claude models with computer use capabilities and Steel browsers to create AI agents that navigate the web.

We'll build a Claude Computer Use loop that enables autonomous web task execution through iterative screenshot analysis and action planning.

Prerequisites

Python 3.11+
A Steel API key (sign up here)
An Anthropic API key with access to Claude models

Step 1: Setup and Dependencies

First, create a project directory, set up a virtual environment, and install the required packages:

Terminal

# Create a project directory
mkdir steel-claude-computer-use
cd steel-claude-computer-use

# Recommended: Create and activate a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

# Install required packages
pip install steel-sdk anthropic playwright python-dotenv pillow

Create a .env file with your API keys:

ENV

.env

1STEEL_API_KEY=your_steel_api_key_here
2ANTHROPIC_API_KEY=your_anthropic_api_key_here
3TASK=Go to Wikipedia and search for machine learning

Step 2: Create Helper Functions

Python

utils.py

1import os
2import time
3import base64
4import json
5import re
6from typing import List, Dict
7from urllib.parse import urlparse
8
9from dotenv import load_dotenv
10from PIL import Image
11from io import BytesIO
12from playwright.sync_api import sync_playwright, Error as PlaywrightError
13from steel import Steel
14from anthropic import Anthropic
15from anthropic.types.beta import BetaMessageParam
16
17
18load_dotenv(override=True)
19
20# Replace with your own API keys
21STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"
22ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY") or "your-anthropic-api-key-here"
23
24# Replace with your own task
25TASK = os.getenv("TASK") or "Go to Wikipedia and search for machine learning"
26
27SYSTEM_PROMPT = """You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.
28
29<CAPABILITIES>
30* You control a Chrome browser tab and can navigate to any website
31* You can click, type, scroll, take screenshots, and interact with web elements
32* You have full internet access and can visit any public website
33* You can read content, fill forms, search for information, and perform complex multi-step tasks
34* After each action, you receive a screenshot showing the current state
35
36<COORDINATE_SYSTEM>
37* The browser viewport has specific dimensions that you must respect
38* All coordinates (x, y) must be within the viewport bounds
39* X coordinates must be between 0 and the display width (inclusive)
40* Y coordinates must be between 0 and the display height (inclusive)
41* Always ensure your click, move, scroll, and drag coordinates are within these bounds
42* If you're unsure about element locations, take a screenshot first to see the current state
43
44<AUTONOMOUS_EXECUTION>
45* Work completely independently - make decisions and act immediately without asking questions
46* Never request clarification, present options, or ask for permission
47* Make intelligent assumptions based on task context
48* If something is ambiguous, choose the most logical interpretation and proceed
49* Take immediate action rather than explaining what you might do
50* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions
51
52<REASONING_STRUCTURE>
53For each step, you must reason systematically:
54* Analyze your previous action's success/failure and current state
55* Identify what specific progress has been made toward the goal
56* Determine the next immediate objective and how to achieve it
57* Choose the most efficient action sequence to make progress
58
59<EFFICIENCY_PRINCIPLES>
60* Combine related actions when possible rather than single-step execution
61* Navigate directly to relevant websites without unnecessary exploration
62* Use screenshots strategically to understand page state before acting
63* Be persistent with alternative approaches if initial attempts fail
64* Focus on the specific information or outcome requested
65
66<COMPLETION_CRITERIA>
67* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"
68* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"
69* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"
70* Do not write anything after completing the task except the required completion message
71* Do not ask questions, provide commentary, or offer additional help after task completion
72* The completion message is the end of the interaction - nothing else should follow
73
74<CRITICAL_REQUIREMENTS>
75* This is fully automated execution - work completely independently
76* Start by taking a screenshot to understand the current state
77* Never click on browser UI elements
78* Always respect coordinate boundaries - invalid coordinates will fail
79* Recognize when the stated objective has been achieved and declare completion immediately
80* Focus on the explicit task given, not implied or potential follow-up tasks
81
82Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results."""
83
84BLOCKED_DOMAINS = [
85    "maliciousbook.com",
86    "evilvideos.com",
87    "darkwebforum.com",
88    "shadytok.com",
89    "suspiciouspins.com",
90    "ilanbigio.com",
91]
92
93MODEL_CONFIGS = {
94    "claude-3-5-sonnet-20241022": {
95        "tool_type": "computer_20241022",
96        "beta_flag": "computer-use-2024-10-22",
97        "description": "Stable Claude 3.5 Sonnet (recommended)"
98    },
99    "claude-3-7-sonnet-20250219": {
100        "tool_type": "computer_20250124",
101        "beta_flag": "computer-use-2025-01-24",
102        "description": "Claude 3.7 Sonnet (newer)"
103    },
104    "claude-sonnet-4-20250514": {
105        "tool_type": "computer_20250124",
106        "beta_flag": "computer-use-2025-01-24",
107        "description": "Claude 4 Sonnet (newest)"
108    },
109    "claude-opus-4-20250514": {
110        "tool_type": "computer_20250124",
111        "beta_flag": "computer-use-2025-01-24",
112        "description": "Claude 4 Opus (newest)"
113    }
114}
115
116CUA_KEY_TO_PLAYWRIGHT_KEY = {
117    "/": "Divide",
118    "\\": "Backslash",
119    "alt": "Alt",
120    "arrowdown": "ArrowDown",
121    "arrowleft": "ArrowLeft",
122    "arrowright": "ArrowRight",
123    "arrowup": "ArrowUp",
124    "backspace": "Backspace",
125    "capslock": "CapsLock",
126    "cmd": "Meta",
127    "ctrl": "Control",
128    "delete": "Delete",
129    "end": "End",
130    "enter": "Enter",
131    "esc": "Escape",
132    "home": "Home",
133    "insert": "Insert",
134    "option": "Alt",
135    "pagedown": "PageDown",
136    "pageup": "PageUp",
137    "shift": "Shift",
138    "space": " ",
139    "super": "Meta",
140    "tab": "Tab",
141    "win": "Meta",
142    "Return": "Enter",
143    "KP_Enter": "Enter",
144    "Escape": "Escape",
145    "BackSpace": "Backspace",
146    "Delete": "Delete",
147    "Tab": "Tab",
148    "ISO_Left_Tab": "Shift+Tab",
149    "Up": "ArrowUp",
150    "Down": "ArrowDown",
151    "Left": "ArrowLeft",
152    "Right": "ArrowRight",
153    "Page_Up": "PageUp",
154    "Page_Down": "PageDown",
155    "Home": "Home",
156    "End": "End",
157    "Insert": "Insert",
158    "F1": "F1", "F2": "F2", "F3": "F3", "F4": "F4",
159    "F5": "F5", "F6": "F6", "F7": "F7", "F8": "F8",
160    "F9": "F9", "F10": "F10", "F11": "F11", "F12": "F12",
161    "Shift_L": "Shift", "Shift_R": "Shift",
162    "Control_L": "Control", "Control_R": "Control",
163    "Alt_L": "Alt", "Alt_R": "Alt",
164    "Meta_L": "Meta", "Meta_R": "Meta",
165    "Super_L": "Meta", "Super_R": "Meta",
166    "minus": "-",
167    "equal": "=",
168    "bracketleft": "[",
169    "bracketright": "]",
170    "semicolon": ";",
171    "apostrophe": "'",
172    "grave": "`",
173    "comma": ",",
174    "period": ".",
175    "slash": "/",
176}
177
178
179def chunks(s: str, chunk_size: int) -> List[str]:
180    return [s[i : i + chunk_size] for i in range(0, len(s), chunk_size)]
181
182
183def pp(obj):
184    print(json.dumps(obj, indent=2))
185
186
187def show_image(base_64_image):
188    image_data = base64.b64decode(base_64_image)
189    image = Image.open(BytesIO(image_data))
190    image.show()
191
192
193def check_blocklisted_url(url: str) -> None:
194    hostname = urlparse(url).hostname or ""
195    if any(
196        hostname == blocked or hostname.endswith(f".{blocked}")
197        for blocked in BLOCKED_DOMAINS
198    ):
199        raise ValueError(f"Blocked URL: {url}")

Step 3: Create Steel Browser Integration

Python

steel_browser.py

1class SteelBrowser:
2    def __init__(
3        self,
4        width: int = 1024,
5        height: int = 768,
6        proxy: bool = False,
7        solve_captcha: bool = False,
8        virtual_mouse: bool = True,
9        session_timeout: int = 900000,
10        ad_blocker: bool = True,
11        start_url: str = "https://www.google.com",
12    ):
13        self.client = Steel(
14            steel_api_key=os.getenv("STEEL_API_KEY"),
15        )
16        self.dimensions = (width, height)
17        self.proxy = proxy
18        self.solve_captcha = solve_captcha
19        self.virtual_mouse = virtual_mouse
20        self.session_timeout = session_timeout
21        self.ad_blocker = ad_blocker
22        self.start_url = start_url
23        self.session = None
24        self._playwright = None
25        self._browser = None
26        self._page = None
27        self._last_mouse_position = None
28
29    def get_dimensions(self):
30        return self.dimensions
31
32    def get_current_url(self) -> str:
33        return self._page.url if self._page else ""
34
35    def __enter__(self):
36        width, height = self.dimensions
37        session_params = {
38            "use_proxy": self.proxy,
39            "solve_captcha": self.solve_captcha,
40            "api_timeout": self.session_timeout,
41            "block_ads": self.ad_blocker,
42            "dimensions": {"width": width, "height": height}
43        }
44        self.session = self.client.sessions.create(**session_params)
45
46        print("Steel Session created successfully!")
47        print(f"View live session at: {self.session.session_viewer_url}")
48
49        self._playwright = sync_playwright().start()
50        browser = self._playwright.chromium.connect_over_cdp(
51            f"{self.session.websocket_url}&apiKey={os.getenv('STEEL_API_KEY')}",
52            timeout=60000
53        )
54        self._browser = browser
55        context = browser.contexts[0]
56
57        def handle_route(route, request):
58            url = request.url
59            try:
60                check_blocklisted_url(url)
61                route.continue_()
62            except ValueError:
63                print(f"Blocking URL: {url}")
64                route.abort()
65
66        if self.virtual_mouse:
67            context.add_init_script("""
68                if (window.self === window.top) {
69                    function initCursor() {
70                        const CURSOR_ID = '__cursor__';
71                        if (document.getElementById(CURSOR_ID)) return;
72
73                        const cursor = document.createElement('div');
74                        cursor.id = CURSOR_ID;
75                        Object.assign(cursor.style, {
76                            position: 'fixed',
77                            top: '0px',
78                            left: '0px',
79                            width: '20px',
80                            height: '20px',
81                            backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',
82                            backgroundSize: 'cover',
83                            pointerEvents: 'none',
84                            zIndex: '99999',
85                            transform: 'translate(-2px, -2px)',
86                        });
87
88                        document.body.appendChild(cursor);
89
90                        document.addEventListener("mousemove", (e) => {
91                            cursor.style.top = e.clientY + "px";
92                            cursor.style.left = e.clientX + "px";
93                        });
94                    }
95
96                    requestAnimationFrame(function checkBody() {
97                        if (document.body) {
98                            initCursor();
99                        } else {
100                            requestAnimationFrame(checkBody);
101                        }
102                    });
103                }
104            """)
105
106        self._page = context.pages[0]
107        self._page.route("**/*", handle_route)
108
109        self._page.set_viewport_size({"width": width, "height": height})
110
111        self._page.goto(self.start_url)
112
113        return self
114
115    def __exit__(self, exc_type, exc_val, exc_tb):
116        if self._page:
117            self._page.close()
118        if self._browser:
119            self._browser.close()
120        if self._playwright:
121            self._playwright.stop()
122
123        if self.session:
124            print("Releasing Steel session...")
125            self.client.sessions.release(self.session.id)
126            print(f"Session completed. View replay at {self.session.session_viewer_url}")
127
128    def screenshot(self) -> str:
129        try:
130            width, height = self.dimensions
131            png_bytes = self._page.screenshot(
132                full_page=False,
133                clip={"x": 0, "y": 0, "width": width, "height": height}
134            )
135            return base64.b64encode(png_bytes).decode("utf-8")
136        except PlaywrightError as error:
137            print(f"Screenshot failed, trying CDP fallback: {error}")
138            try:
139                cdp_session = self._page.context.new_cdp_session(self._page)
140                result = cdp_session.send(
141                    "Page.captureScreenshot", {"format": "png", "fromSurface": False}
142                )
143                return result["data"]
144            except PlaywrightError as cdp_error:
145                print(f"CDP screenshot also failed: {cdp_error}")
146                raise error
147
148    def validate_and_get_coordinates(self, coordinate):
149        if not isinstance(coordinate, (list, tuple)) or len(coordinate) != 2:
150            raise ValueError(f"{coordinate} must be a tuple or list of length 2")
151        if not all(isinstance(i, int) and i >= 0 for i in coordinate):
152            raise ValueError(f"{coordinate} must be a tuple/list of non-negative ints")
153
154        x, y = self.clamp_coordinates(coordinate[0], coordinate[1])
155        return x, y
156
157    def clamp_coordinates(self, x: int, y: int):
158        width, height = self.dimensions
159        clamped_x = max(0, min(x, width - 1))
160        clamped_y = max(0, min(y, height - 1))
161
162        if x != clamped_x or y != clamped_y:
163            print(f"⚠️  Coordinate clamped: ({x}, {y}) → ({clamped_x}, {clamped_y})")
164
165        return clamped_x, clamped_y
166
167    def execute_computer_action(
168        self,
169        action: str,
170        text: str = None,
171        coordinate = None,
172        scroll_direction: str = None,
173        scroll_amount: int = None,
174        duration = None,
175        key: str = None,
176        **kwargs
177    ) -> str:
178
179        if action in ("left_mouse_down", "left_mouse_up"):
180            if coordinate is not None:
181                raise ValueError(f"coordinate is not accepted for {action}")
182
183            if action == "left_mouse_down":
184                self._page.mouse.down()
185            elif action == "left_mouse_up":
186                self._page.mouse.up()
187
188            return self.screenshot()
189
190        if action == "scroll":
191            if scroll_direction is None or scroll_direction not in ("up", "down", "left", "right"):
192                raise ValueError("scroll_direction must be 'up', 'down', 'left', or 'right'")
193            if scroll_amount is None or not isinstance(scroll_amount, int) or scroll_amount < 0:
194                raise ValueError("scroll_amount must be a non-negative int")
195
196            if coordinate is not None:
197                x, y = self.validate_and_get_coordinates(coordinate)
198                self._page.mouse.move(x, y)
199                self._last_mouse_position = (x, y)
200
201            if text:
202                modifier_key = text
203                if modifier_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
204                    modifier_key = CUA_KEY_TO_PLAYWRIGHT_KEY[modifier_key]
205                self._page.keyboard.down(modifier_key)
206
207            scroll_mapping = {
208                "down": (0, 100 * scroll_amount),
209                "up": (0, -100 * scroll_amount),
210                "right": (100 * scroll_amount, 0),
211                "left": (-100 * scroll_amount, 0)
212            }
213            delta_x, delta_y = scroll_mapping[scroll_direction]
214            self._page.mouse.wheel(delta_x, delta_y)
215
216            if text:
217                self._page.keyboard.up(modifier_key)
218
219            return self.screenshot()
220
221        if action in ("hold_key", "wait"):
222            if duration is None or not isinstance(duration, (int, float)):
223                raise ValueError("duration must be a number")
224            if duration < 0:
225                raise ValueError("duration must be non-negative")
226            if duration > 100:
227                raise ValueError("duration is too long")
228
229            if action == "hold_key":
230                if text is None:
231                    raise ValueError("text is required for hold_key")
232
233                hold_key = text
234                if hold_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
235                    hold_key = CUA_KEY_TO_PLAYWRIGHT_KEY[hold_key]
236
237                self._page.keyboard.down(hold_key)
238                time.sleep(duration)
239                self._page.keyboard.up(hold_key)
240
241            elif action == "wait":
242                time.sleep(duration)
243
244            return self.screenshot()
245
246        if action in ("left_click", "right_click", "double_click", "triple_click", "middle_click"):
247            if text is not None:
248                raise ValueError(f"text is not accepted for {action}")
249
250            if coordinate is not None:
251                x, y = self.validate_and_get_coordinates(coordinate)
252                self._page.mouse.move(x, y)
253                self._last_mouse_position = (x, y)
254                click_x, click_y = x, y
255            elif self._last_mouse_position:
256                click_x, click_y = self._last_mouse_position
257            else:
258                width, height = self.dimensions
259                click_x, click_y = width // 2, height // 2
260
261            if key:
262                modifier_key = key
263                if modifier_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
264                    modifier_key = CUA_KEY_TO_PLAYWRIGHT_KEY[modifier_key]
265                self._page.keyboard.down(modifier_key)
266
267            if action == "left_click":
268                self._page.mouse.click(click_x, click_y)
269            elif action == "right_click":
270                self._page.mouse.click(click_x, click_y, button="right")
271            elif action == "double_click":
272                self._page.mouse.dblclick(click_x, click_y)
273            elif action == "triple_click":
274                for _ in range(3):
275                    self._page.mouse.click(click_x, click_y)
276            elif action == "middle_click":
277                self._page.mouse.click(click_x, click_y, button="middle")
278
279            if key:
280                self._page.keyboard.up(modifier_key)
281
282            return self.screenshot()
283
284        if action in ("mouse_move", "left_click_drag"):
285            if coordinate is None:
286                raise ValueError(f"coordinate is required for {action}")
287            if text is not None:
288                raise ValueError(f"text is not accepted for {action}")
289
290            x, y = self.validate_and_get_coordinates(coordinate)
291
292            if action == "mouse_move":
293                self._page.mouse.move(x, y)
294                self._last_mouse_position = (x, y)
295            elif action == "left_click_drag":
296                self._page.mouse.down()
297                self._page.mouse.move(x, y)
298                self._page.mouse.up()
299                self._last_mouse_position = (x, y)
300
301            return self.screenshot()
302
303        if action in ("key", "type"):
304            if text is None:
305                raise ValueError(f"text is required for {action}")
306            if coordinate is not None:
307                raise ValueError(f"coordinate is not accepted for {action}")
308
309            if action == "key":
310                press_key = text
311
312                if "+" in press_key:
313                    key_parts = press_key.split("+")
314                    modifier_keys = key_parts[:-1]
315                    main_key = key_parts[-1]
316
317                    playwright_modifiers = []
318                    for mod in modifier_keys:
319                        if mod.lower() in ("ctrl", "control"):
320                            playwright_modifiers.append("Control")
321                        elif mod.lower() in ("shift",):
322                            playwright_modifiers.append("Shift")
323                        elif mod.lower() in ("alt", "option"):
324                            playwright_modifiers.append("Alt")
325                        elif mod.lower() in ("cmd", "meta", "super"):
326                            playwright_modifiers.append("Meta")
327                        else:
328                            playwright_modifiers.append(mod)
329
330                    if main_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
331                        main_key = CUA_KEY_TO_PLAYWRIGHT_KEY[main_key]
332
333                    press_key = "+".join(playwright_modifiers + [main_key])
334                else:
335                    if press_key in CUA_KEY_TO_PLAYWRIGHT_KEY:
336                        press_key = CUA_KEY_TO_PLAYWRIGHT_KEY[press_key]
337
338                self._page.keyboard.press(press_key)
339            elif action == "type":
340                for chunk in chunks(text, 50):
341                    self._page.keyboard.type(chunk, delay=12)
342                    time.sleep(0.01)
343
344            return self.screenshot()
345
346        if action in ("screenshot", "cursor_position"):
347            if text is not None:
348                raise ValueError(f"text is not accepted for {action}")
349            if coordinate is not None:
350                raise ValueError(f"coordinate is not accepted for {action}")
351
352            return self.screenshot()
353
354        raise ValueError(f"Invalid action: {action}")

Step 4: Create the Agent Class

Python

claude_agent.py

1class ClaudeAgent:
2    def __init__(self, computer = None, model: str = "claude-3-5-sonnet-20241022"):
3        self.client = Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
4        self.computer = computer
5        self.messages: List[BetaMessageParam] = []
6        self.model = model
7
8        if computer:
9            width, height = computer.get_dimensions()
10            self.viewport_width = width
11            self.viewport_height = height
12
13            self.system_prompt = SYSTEM_PROMPT.replace(
14                '<COORDINATE_SYSTEM>',
15                f'<COORDINATE_SYSTEM>\n* The browser viewport dimensions are {width}x{height} pixels\n* The browser viewport has specific dimensions that you must respect'
16            )
17
18            if model not in MODEL_CONFIGS:
19                raise ValueError(f"Unsupported model: {model}. Available models: {list(MODEL_CONFIGS.keys())}")
20
21            self.model_config = MODEL_CONFIGS[model]
22
23            self.tools = [{
24                "type": self.model_config["tool_type"],
25                "name": "computer",
26                "display_width_px": width,
27                "display_height_px": height,
28                "display_number": 1,
29            }]
30        else:
31            self.viewport_width = 1024
32            self.viewport_height = 768
33            self.system_prompt = SYSTEM_PROMPT
34
35    def get_viewport_info(self) -> dict:
36        if not self.computer or not self.computer._page:
37            return {}
38
39        try:
40            return self.computer._page.evaluate("""
41                () => ({
42                    innerWidth: window.innerWidth,
43                    innerHeight: window.innerHeight,
44                    devicePixelRatio: window.devicePixelRatio,
45                    screenWidth: window.screen.width,
46                    screenHeight: window.screen.height,
47                    scrollX: window.scrollX,
48                    scrollY: window.scrollY
49                })
50            """)
51        except:
52            return {}
53
54    def validate_screenshot_dimensions(self, screenshot_base64: str) -> dict:
55        try:
56            image_data = base64.b64decode(screenshot_base64)
57            image = Image.open(BytesIO(image_data))
58            screenshot_width, screenshot_height = image.size
59
60            viewport_info = self.get_viewport_info()
61
62            scaling_info = {
63                "screenshot_size": (screenshot_width, screenshot_height),
64                "viewport_size": (self.viewport_width, self.viewport_height),
65                "actual_viewport": (viewport_info.get('innerWidth', 0), viewport_info.get('innerHeight', 0)),
66                "device_pixel_ratio": viewport_info.get('devicePixelRatio', 1.0),
67                "width_scale": screenshot_width / self.viewport_width if self.viewport_width > 0 else 1.0,
68                "height_scale": screenshot_height / self.viewport_height if self.viewport_height > 0 else 1.0
69            }
70
71            if scaling_info["width_scale"] != 1.0 or scaling_info["height_scale"] != 1.0:
72                print(f"⚠️  Screenshot scaling detected:")
73                print(f"   Screenshot: {screenshot_width}x{screenshot_height}")
74                print(f"   Expected viewport: {self.viewport_width}x{self.viewport_height}")
75                print(f"   Actual viewport: {viewport_info.get('innerWidth', 'unknown')}x{viewport_info.get('innerHeight', 'unknown')}")
76                print(f"   Scale factors: {scaling_info['width_scale']:.3f}x{scaling_info['height_scale']:.3f}")
77
78            return scaling_info
79        except Exception as e:
80            print(f"⚠️  Error validating screenshot dimensions: {e}")
81            return {}
82
83    def execute_task(
84        self,
85        task: str,
86        print_steps: bool = True,
87        debug: bool = False,
88        max_iterations: int = 50
89    ) -> str:
90
91        input_items = [
92            {
93                "role": "user",
94                "content": task,
95            },
96        ]
97
98        new_items = []
99        iterations = 0
100        consecutive_no_actions = 0
101        last_assistant_messages = []
102
103        print(f"🎯 Executing task: {task}")
104        print("=" * 60)
105
106        def is_task_complete(content: str) -> dict:
107            if "TASK_COMPLETED:" in content:
108                return {"completed": True, "reason": "explicit_completion"}
109            if "TASK_FAILED:" in content or "TASK_ABANDONED:" in content:
110                return {"completed": True, "reason": "explicit_failure"}
111
112            completion_patterns = [
113                r'task\s+(completed|finished|done|accomplished)',
114                r'successfully\s+(completed|finished|found|gathered)',
115                r'here\s+(is|are)\s+the\s+(results?|information|summary)',
116                r'to\s+summarize',
117                r'in\s+conclusion',
118                r'final\s+(answer|result|summary)'
119            ]
120
121            failure_patterns = [
122                r'cannot\s+(complete|proceed|access|continue)',
123                r'unable\s+to\s+(complete|access|find|proceed)',
124                r'blocked\s+by\s+(captcha|security|authentication)',
125                r'giving\s+up',
126                r'no\s+longer\s+able',
127                r'have\s+tried\s+multiple\s+approaches'
128            ]
129
130            for pattern in completion_patterns:
131                if re.search(pattern, content, re.IGNORECASE):
132                    return {"completed": True, "reason": "natural_completion"}
133
134            for pattern in failure_patterns:
135                if re.search(pattern, content, re.IGNORECASE):
136                    return {"completed": True, "reason": "natural_failure"}
137
138            return {"completed": False}
139
140        def detect_repetition(new_message: str) -> bool:
141            if len(last_assistant_messages) < 2:
142                return False
143
144            def similarity(str1: str, str2: str) -> float:
145                words1 = str1.lower().split()
146                words2 = str2.lower().split()
147                common_words = [word for word in words1 if word in words2]
148                return len(common_words) / max(len(words1), len(words2))
149
150            return any(similarity(new_message, prev_message) > 0.8
151                      for prev_message in last_assistant_messages)
152
153        while iterations < max_iterations:
154            iterations += 1
155            has_actions = False
156
157            if new_items and new_items[-1].get("role") == "assistant":
158                last_message = new_items[-1]
159                if last_message.get("content") and len(last_message["content"]) > 0:
160                    content = last_message["content"][0].get("text", "")
161
162                    completion = is_task_complete(content)
163                    if completion["completed"]:
164                        print(f"✅ Task completed ({completion['reason']})")
165                        break
166
167                    if detect_repetition(content):
168                        print("🔄 Repetition detected - stopping execution")
169                        last_assistant_messages.append(content)
170                        break
171
172                    last_assistant_messages.append(content)
173                    if len(last_assistant_messages) > 3:
174                        last_assistant_messages.pop(0)
175
176            if debug:
177                pp(input_items + new_items)
178
179            try:
180                response = self.client.beta.messages.create(
181                    model=self.model,
182                    max_tokens=4096,
183                    system=self.system_prompt,
184                    messages=input_items + new_items,
185                    tools=self.tools,
186                    betas=[self.model_config["beta_flag"]]
187                )
188
189                if debug:
190                    pp(response)
191
192                for block in response.content:
193                    if block.type == "text":
194                        print(block.text)
195                        new_items.append({
196                            "role": "assistant",
197                            "content": [
198                                {
199                                    "type": "text",
200                                    "text": block.text
201                                }
202                            ]
203                        })
204                    elif block.type == "tool_use":
205                        has_actions = True
206                        if block.name == "computer":
207                            tool_input = block.input
208                            action = tool_input.get("action")
209
210                            print(f"🔧 {action}({tool_input})")
211
212                            screenshot_base64 = self.computer.execute_computer_action(
213                                action=action,
214                                text=tool_input.get("text"),
215                                coordinate=tool_input.get("coordinate"),
216                                scroll_direction=tool_input.get("scroll_direction"),
217                                scroll_amount=tool_input.get("scroll_amount"),
218                                duration=tool_input.get("duration"),
219                                key=tool_input.get("key")
220                            )
221
222                            if action == "screenshot":
223                                self.validate_screenshot_dimensions(screenshot_base64)
224
225                            new_items.append({
226                                "role": "assistant",
227                                "content": [
228                                    {
229                                        "type": "tool_use",
230                                        "id": block.id,
231                                        "name": block.name,
232                                        "input": tool_input
233                                    }
234                                ]
235                            })
236
237                            current_url = self.computer.get_current_url()
238                            check_blocklisted_url(current_url)
239
240                            new_items.append({
241                                "role": "user",
242                                "content": [
243                                    {
244                                        "type": "tool_result",
245                                        "tool_use_id": block.id,
246                                        "content": [
247                                            {
248                                                "type": "image",
249                                                "source": {
250                                                    "type": "base64",
251                                                    "media_type": "image/png",
252                                                    "data": screenshot_base64
253                                                }
254                                            }
255                                        ]
256                                    }
257                                ]
258                            })
259
260                if not has_actions:
261                    consecutive_no_actions += 1
262                    if consecutive_no_actions >= 3:
263                        print("⚠️  No actions for 3 consecutive iterations - stopping")
264                        break
265                else:
266                    consecutive_no_actions = 0
267
268            except Exception as error:
269                print(f"❌ Error during task execution: {error}")
270                raise error
271
272        if iterations >= max_iterations:
273            print(f"⚠️  Task execution stopped after {max_iterations} iterations")
274
275        assistant_messages = [item for item in new_items if item.get("role") == "assistant"]
276        if assistant_messages:
277            final_message = assistant_messages[-1]
278            content = final_message.get("content")
279            if isinstance(content, list) and len(content) > 0:
280                for block in content:
281                    if isinstance(block, dict) and block.get("type") == "text":
282                        return block.get("text", "Task execution completed (no final message)")
283
284        return "Task execution completed (no final message)"

Step 5: Create the Main Script

Python

main.py

1def main():
2    print("🚀 Steel + Claude Computer Use Assistant")
3    print("=" * 60)
4
5    if STEEL_API_KEY == "your-steel-api-key-here":
6        print("⚠️  WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key")
7        print("   Get your API key at: https://app.steel.dev/settings/api-keys")
8        return
9
10    if ANTHROPIC_API_KEY == "your-anthropic-api-key-here":
11        print("⚠️  WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key")
12        print("   Get your API key at: https://console.anthropic.com/")
13        return
14
15    print("\nStarting Steel browser session...")
16
17    try:
18        with SteelBrowser() as computer:
19            print("✅ Steel browser session started!")
20
21            agent = ClaudeAgent(
22                computer=computer,
23                model="claude-3-5-sonnet-20241022",
24            )
25
26            start_time = time.time()
27
28            try:
29                result = agent.execute_task(
30                    TASK,
31                    print_steps=True,
32                    debug=False,
33                    max_iterations=50,
34                )
35
36                duration = f"{(time.time() - start_time):.1f}"
37
38                print("\n" + "=" * 60)
39                print("🎉 TASK EXECUTION COMPLETED")
40                print("=" * 60)
41                print(f"⏱️  Duration: {duration} seconds")
42                print(f"🎯 Task: {TASK}")
43                print(f"📋 Result:\n{result}")
44                print("=" * 60)
45
46            except Exception as error:
47                print(f"❌ Task execution failed: {error}")
48                exit(1)
49
50    except Exception as e:
51        print(f"❌ Failed to start Steel browser: {e}")
52        print("Please check your STEEL_API_KEY and internet connection.")
53        exit(1)
54
55
56if __name__ == "__main__":
57    main()

Running Your Agent

Execute your script:

You'll see the session URL printed in the console. Open this URL to view the live browser session. The agent will execute the task defined in the TASK environment variable or the default task.

You can modify the task by setting the environment variable:

Terminal

export TASK="Search for the latest developments in artificial intelligence"
python main.py

Customizing your agent's task

Try modifying the task to make your agent perform different actions:

ENV

.env

1# Research specific topics
2TASK = "Go to https://arxiv.org, search for 'computer vision', and summarize the latest papers."
3
4# E-commerce tasks
5TASK = "Go to https://www.amazon.com, search for 'mechanical keyboards', and compare the top 3 results."
6
7# Information gathering
8TASK = "Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary."

Supported Models: This example uses Claude 3.5 Sonnet, but you can use any of the supported Claude models including Claude 3.7 Sonnet, Claude 4 Sonnet, or Claude 4 Opus. Update the model parameter in the ClaudeAgent constructor to switch models.

Next Steps

Explore the Steel API documentation for more advanced features
Check out the Anthropic documentation for more information about Claude's computer use capabilities
Add additional features like session recording or multi-session management