Quickstart (Python)
How to use Claude Computer Use with Steel
This guide shows you how to use Claude models with computer use capabilities and Steel's Computer API to create AI agents that navigate the web.
We'll build a Claude Computer Use loop that enables autonomous web task execution through iterative screenshot analysis and action planning.
Prerequisites
-
Python 3.11+
-
A Steel API key (sign up here)
-
An Anthropic API key with access to Claude models
Step 1: Setup and Helper Functions
First, set up a virtual environment and install the required packages:
$uv venv$source .venv/bin/activate$uv add steel-sdk anthropic python-dotenv
Create a .env file with your API keys:
1STEEL_API_KEY=your_steel_api_key_here2ANTHROPIC_API_KEY=your_anthropic_api_key_here3TASK=Go to Steel.dev and find the latest news
Create a file with helper functions and constants:
1import os2import json3from typing import List, Optional, Tuple4from datetime import datetime56from dotenv import load_dotenv7from steel import Steel8from anthropic import Anthropic9from anthropic.types.beta import BetaMessageParam1011load_dotenv(override=True)1213STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"14ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY") or "your-anthropic-api-key-here"15TASK = os.getenv("TASK") or "Go to Steel.dev and find the latest news"161718def format_today() -> str:19return datetime.now().strftime("%A, %B %d, %Y")202122BROWSER_SYSTEM_PROMPT = f"""<BROWSER_ENV>23- You control a headful Chromium browser running in a VM with internet access.24- Chromium is already open; interact only through the "computer" tool (mouse, keyboard, scroll, screenshots).25- Today's date is {format_today()}.26</BROWSER_ENV>2728<BROWSER_CONTROL>29- When viewing pages, zoom out or scroll so all relevant content is visible.30- When typing into any input:31* Clear it first with Ctrl+A, then Delete.32* After submitting (pressing Enter or clicking a button), take an extra screenshot to confirm the result and move the mouse away.33- Computer tool calls are slow; batch related actions into a single call whenever possible.34- You may act on the user's behalf on sites where they are already authenticated.35- Assume any required authentication/Auth Contexts are already configured before the task starts.36- If the first screenshot is black:37* Click near the center of the screen.38* Take another screenshot.39- Never click the browser address bar with the mouse. To navigate to a URL:40* Press Ctrl+L to focus and select the address bar.41* Type the full URL, then press Enter.42* If you see any existing text (e.g., 'about:blank'), press Ctrl+L before typing so you replace it (never append).43- Prefer typing into inputs on the page (e.g., a site's search box) rather than the browser address bar, unless entering a direct URL.44</BROWSER_CONTROL>4546<TASK_EXECUTION>47- You receive exactly one natural-language task and no further user feedback.48- Do not ask the user clarifying questions; instead, make reasonable assumptions and proceed.49- For complex tasks, quickly plan a short, ordered sequence of steps before acting.50- Prefer minimal, high-signal actions that move directly toward the goal.51- Keep your final response concise and focused on fulfilling the task (e.g., a brief summary of findings or results).52</TASK_EXECUTION>"""535455def pp(obj) -> None:56print(json.dumps(obj, indent=2))
Step 2: Create the Agent Class
1import time2import json3from typing import List, Optional, Tuple45from helpers import (6STEEL_API_KEY,7ANTHROPIC_API_KEY,8BROWSER_SYSTEM_PROMPT,9pp,10)11from steel import Steel12from anthropic import Anthropic13from anthropic.types.beta import BetaMessageParam141516class Agent:17def __init__(self):18self.client = Anthropic(api_key=ANTHROPIC_API_KEY)19self.steel = Steel(steel_api_key=STEEL_API_KEY)20self.model = "claude-sonnet-4-5"21self.messages: List[BetaMessageParam] = []22self.session = None23self.viewport_width = 128024self.viewport_height = 76825self.system_prompt = BROWSER_SYSTEM_PROMPT26self.tools = [27{28"type": "computer_20250124",29"name": "computer",30"display_width_px": self.viewport_width,31"display_height_px": self.viewport_height,32"display_number": 1,33}34]3536def _center(self) -> Tuple[int, int]:37return (self.viewport_width // 2, self.viewport_height // 2)3839def _split_keys(self, k: Optional[str]) -> List[str]:40return [s.strip() for s in k.split("+")] if k else []4142def _normalize_key(self, key: str) -> str:43if not isinstance(key, str) or not key:44return key45k = key.strip()46upper = k.upper()47synonyms = {48"ENTER": "Enter",49"RETURN": "Enter",50"ESC": "Escape",51"ESCAPE": "Escape",52"TAB": "Tab",53"BACKSPACE": "Backspace",54"DELETE": "Delete",55"SPACE": "Space",56"CTRL": "Control",57"CONTROL": "Control",58"ALT": "Alt",59"SHIFT": "Shift",60"META": "Meta",61"CMD": "Meta",62"UP": "ArrowUp",63"DOWN": "ArrowDown",64"LEFT": "ArrowLeft",65"RIGHT": "ArrowRight",66"HOME": "Home",67"END": "End",68"PAGEUP": "PageUp",69"PAGEDOWN": "PageDown",70}71if upper in synonyms:72return synonyms[upper]73if upper.startswith("F") and upper[1:].isdigit():74return "F" + upper[1:]75return k7677def _normalize_keys(self, keys: List[str]) -> List[str]:78return [self._normalize_key(k) for k in keys]7980def initialize(self) -> None:81width = self.viewport_width82height = self.viewport_height83self.session = self.steel.sessions.create(84dimensions={"width": width, "height": height},85block_ads=True,86api_timeout=900000,87)88print("Steel Session created successfully!")89print(f"View live session at: {self.session.session_viewer_url}")9091def cleanup(self) -> None:92if self.session:93print("Releasing Steel session...")94self.steel.sessions.release(self.session.id)95print(96f"Session completed. View replay at {self.session.session_viewer_url}"97)9899def take_screenshot(self) -> str:100resp = self.steel.sessions.computer(self.session.id, action="take_screenshot")101img = getattr(resp, "base64_image", None)102if not img:103raise RuntimeError("No screenshot returned from Input API")104return img105106def execute_computer_action(107self,108action: str,109text: Optional[str] = None,110coordinate: Optional[Tuple[int, int]] = None,111scroll_direction: Optional[str] = None,112scroll_amount: Optional[int] = None,113duration: Optional[float] = None,114key: Optional[str] = None,115) -> str:116if (117coordinate118and isinstance(coordinate, (list, tuple))119and len(coordinate) == 2120):121coords = (int(coordinate[0]), int(coordinate[1]))122else:123coords = self._center()124125body: Optional[dict] = None126127if action == "mouse_move":128body = {129"action": "move_mouse",130"coordinates": [coords[0], coords[1]],131"screenshot": True,132}133hk = self._split_keys(key)134if hk:135body["hold_keys"] = hk136137elif action in ("left_mouse_down", "left_mouse_up"):138body = {139"action": "click_mouse",140"button": "left",141"click_type": "down" if action == "left_mouse_down" else "up",142"coordinates": [coords[0], coords[1]],143"screenshot": True,144}145hk = self._split_keys(key)146if hk:147body["hold_keys"] = hk148149elif action in (150"left_click",151"right_click",152"middle_click",153"double_click",154"triple_click",155):156button_map = {157"left_click": "left",158"right_click": "right",159"middle_click": "middle",160"double_click": "left",161"triple_click": "left",162}163clicks = (1642 if action == "double_click" else 3 if action == "triple_click" else 1165)166body = {167"action": "click_mouse",168"button": button_map[action],169"coordinates": [coords[0], coords[1]],170"screenshot": True,171}172if clicks > 1:173body["num_clicks"] = clicks174hk = self._split_keys(key)175if hk:176body["hold_keys"] = hk177178elif action == "left_click_drag":179start_x, start_y = self._center()180end_x, end_y = coords181body = {182"action": "drag_mouse",183"path": [[start_x, start_y], [end_x, end_y]],184"screenshot": True,185}186hk = self._split_keys(key)187if hk:188body["hold_keys"] = hk189190elif action == "scroll":191step = 100192dx_dy = {193"down": (0, step * (scroll_amount or 0)),194"up": (0, -step * (scroll_amount or 0)),195"right": (step * (scroll_amount or 0), 0),196"left": (-(step * (scroll_amount or 0)), 0),197}198dx, dy = dx_dy.get(199scroll_direction or "down", (0, step * (scroll_amount or 0))200)201body = {202"action": "scroll",203"coordinates": [coords[0], coords[1]],204"delta_x": dx,205"delta_y": dy,206"screenshot": True,207}208hk = self._split_keys(text)209if hk:210body["hold_keys"] = hk211212elif action == "hold_key":213keys = self._split_keys(text or "")214keys = self._normalize_keys(keys)215body = {216"action": "press_key",217"keys": keys or [],218"duration": duration,219"screenshot": True,220}221222elif action == "key":223keys = self._split_keys(text or "")224keys = self._normalize_keys(keys)225body = {226"action": "press_key",227"keys": keys or [],228"screenshot": True,229}230231elif action == "type":232body = {233"action": "type_text",234"text": text,235"screenshot": True,236}237hk = self._split_keys(key)238if hk:239body["hold_keys"] = hk240241elif action == "wait":242body = {243"action": "wait",244"duration": duration,245"screenshot": True,246}247248elif action == "screenshot":249return self.take_screenshot()250251elif action == "cursor_position":252self.steel.sessions.computer(self.session.id, action="get_cursor_position")253return self.take_screenshot()254255else:256raise ValueError(f"Invalid action: {action}")257258clean_body = {k: v for k, v in body.items() if v is not None}259resp = self.steel.sessions.computer(self.session.id, **clean_body)260img = getattr(resp, "base64_image", None)261if img:262return img263return self.take_screenshot()264265def process_response(self, message) -> str:266response_text = ""267268for block in message.content:269if block.type == "text":270response_text += block.text271print(block.text)272elif block.type == "tool_use":273tool_name = block.name274tool_input = block.input275print(f"๐ง {tool_name}({json.dumps(tool_input)})")276277if tool_name == "computer":278action = tool_input.get("action")279params = {280"text": tool_input.get("text"),281"coordinate": tool_input.get("coordinate"),282"scroll_direction": tool_input.get("scroll_direction"),283"scroll_amount": tool_input.get("scroll_amount"),284"duration": tool_input.get("duration"),285"key": tool_input.get("key"),286}287288try:289screenshot_base64 = self.execute_computer_action(290action=action,291text=params["text"],292coordinate=params["coordinate"],293scroll_direction=params["scroll_direction"],294scroll_amount=params["scroll_amount"],295duration=params["duration"],296key=params["key"],297)298299self.messages.append(300{301"role": "assistant",302"content": [303{304"type": "tool_use",305"id": block.id,306"name": block.name,307"input": tool_input,308}309],310}311)312self.messages.append(313{314"role": "user",315"content": [316{317"type": "tool_result",318"tool_use_id": block.id,319"content": [320{321"type": "image",322"source": {323"type": "base64",324"media_type": "image/png",325"data": screenshot_base64,326},327}328],329}330],331}332)333return self.get_claude_response()334335except Exception as e:336print(f"โ Error executing {action}: {e}")337self.messages.append(338{339"role": "assistant",340"content": [341{342"type": "tool_use",343"id": block.id,344"name": block.name,345"input": tool_input,346}347],348}349)350self.messages.append(351{352"role": "user",353"content": [354{355"type": "tool_result",356"tool_use_id": block.id,357"content": f"Error executing {action}: {e}",358"is_error": True,359}360],361}362)363return self.get_claude_response()364365if response_text and not any(b.type == "tool_use" for b in message.content):366self.messages.append({"role": "assistant", "content": response_text})367368return response_text369370def get_claude_response(self) -> str:371try:372response = self.client.beta.messages.create(373model=self.model,374max_tokens=4096,375messages=self.messages,376tools=self.tools,377betas=["computer-use-2025-01-24"],378)379return self.process_response(response)380except Exception as e:381err = f"Error communicating with Claude: {e}"382print(f"โ {err}")383return err384385def execute_task(386self,387task: str,388print_steps: bool = True,389debug: bool = False,390max_iterations: int = 50,391) -> str:392self.messages = [393{"role": "user", "content": self.system_prompt},394{"role": "user", "content": task},395]396397iterations = 0398consecutive_no_actions = 0399last_assistant_messages: List[str] = []400401print(f"๐ฏ Executing task: {task}")402print("=" * 60)403404def detect_repetition(new_message: str) -> bool:405if len(last_assistant_messages) < 2:406return False407words1 = new_message.lower().split()408return any(409len([w for w in words1 if w in prev.lower().split()])410/ max(len(words1), len(prev.lower().split()))411> 0.8412for prev in last_assistant_messages413)414415while iterations < max_iterations:416iterations += 1417has_actions = False418419last_assistant = None420for msg in reversed(self.messages):421if msg.get("role") == "assistant" and isinstance(422msg.get("content"), str423):424last_assistant = msg.get("content")425break426427if isinstance(last_assistant, str):428if detect_repetition(last_assistant):429print("๐ Repetition detected - stopping execution")430last_assistant_messages.append(last_assistant)431break432last_assistant_messages.append(last_assistant)433if len(last_assistant_messages) > 3:434last_assistant_messages.pop(0)435436if debug:437pp(self.messages)438439try:440response = self.client.beta.messages.create(441model=self.model,442max_tokens=4096,443messages=self.messages,444tools=self.tools,445betas=["computer-use-2025-01-24"],446)447448if debug:449pp(response)450451for block in response.content:452if block.type == "tool_use":453has_actions = True454455self.process_response(response)456457if not has_actions:458consecutive_no_actions += 1459if consecutive_no_actions >= 3:460print("โ ๏ธ No actions for 3 consecutive iterations - stopping")461break462else:463consecutive_no_actions = 0464465except Exception as e:466print(f"โ Error during task execution: {e}")467raise e468469if iterations >= max_iterations:470print(f"โ ๏ธ Task execution stopped after {max_iterations} iterations")471472assistant_messages = [m for m in self.messages if m.get("role") == "assistant"]473final_message = assistant_messages[-1] if assistant_messages else None474if final_message and isinstance(final_message.get("content"), str):475return final_message["content"]476477return "Task execution completed (no final message)"
Step 3: Create the Main Script
1import sys2import time34from helpers import STEEL_API_KEY, ANTHROPIC_API_KEY, TASK5from agent import Agent678def main():9print("๐ Steel + Claude Computer Use Assistant")10print("=" * 60)1112if STEEL_API_KEY == "your-steel-api-key-here":13print(14"โ ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"15)16print(" Get your API key at: https://app.steel.dev/settings/api-keys")17sys.exit(1)1819if ANTHROPIC_API_KEY == "your-anthropic-api-key-here":20print(21"โ ๏ธ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key"22)23print(" Get your API key at: https://console.anthropic.com/")24sys.exit(1)2526print("\nStarting Steel session...")27agent = Agent()2829try:30agent.initialize()31print("โ Steel session started!")3233start_time = time.time()3435try:36result = agent.execute_task(TASK, True, False, 50)37duration = f"{(time.time() - start_time):.1f}"3839print("\n" + "=" * 60)40print("๐ TASK EXECUTION COMPLETED")41print("=" * 60)42print(f"โฑ๏ธ Duration: {duration} seconds")43print(f"๐ฏ Task: {TASK}")44print(f"๐ Result:\n{result}")45print("=" * 60)4647except Exception as e:48print(f"โ Task execution failed: {e}")49raise RuntimeError("Task execution failed")5051except Exception as e:52print(f"โ Failed to start Steel session: {e}")53print("Please check your STEEL_API_KEY and internet connection.")54raise RuntimeError("Failed to start Steel session")5556finally:57agent.cleanup()585960if __name__ == "__main__":61main()
Running Your Agent
Execute your script:
python main.py
You'll see the session URL printed in the console. Open this URL to view the live browser session.
The agent will execute the task defined in the TASK environment variable or the default task.
You can modify the task by setting the environment variable:
export TASK="Search for the latest developments in artificial intelligence"python main.py
Customizing your agent's task
Try modifying the task to make your agent perform different actions:
1# Research specific topics2TASK=Go to https://arxiv.org, search for 'computer vision', and summarize the latest papers.34# E-commerce tasks5TASK=Go to https://www.amazon.com, search for 'mechanical keyboards', and compare the top 3 results.67# Information gathering8TASK=Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary.
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the Anthropic documentation for more information about Claude's computer use capabilities
-
Add additional features like session recording or multi-session management