Quickstart (Python)
How to use OpenAI Computer Use with Steel
This guide will walk you through how to use OpenAI's computer-use-preview model with Steel's Computer API to create AI agents that can navigate the web.
We'll be implementing a simple CUA loop that functions as described below:

Prerequisites
-
Python 3.8+
-
A Steel API key (sign up here)
-
An OpenAI API key with access to the
computer-use-previewmodel
Step 1: Setup and Helper Functions
First, set up a virtual environment and install the required packages:
$uv venv$source .venv/bin/activate$uv add steel-sdk requests python-dotenv
Create a .env file with your API keys:
1STEEL_API_KEY=your_steel_api_key_here2OPENAI_API_KEY=your_openai_api_key_here3TASK=Go to Steel.dev and find the latest news
Create a file with helper functions and constants:
1import os2import json3from typing import Any, Dict, List, Optional, Tuple4from datetime import datetime56import requests7from dotenv import load_dotenv8from steel import Steel910load_dotenv(override=True)1112STEEL_API_KEY = os.getenv("STEEL_API_KEY") or "your-steel-api-key-here"13OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") or "your-openai-api-key-here"14TASK = os.getenv("TASK") or "Go to Steel.dev and find the latest news"151617def format_today() -> str:18return datetime.now().strftime("%A, %B %d, %Y")192021BROWSER_SYSTEM_PROMPT = f"""<BROWSER_ENV>22- You control a headful Chromium browser running in a VM with internet access.23- Interact only through the computer tool (mouse/keyboard/scroll/screenshots). Do not call navigation functions.24- Today's date is {format_today()}.25</BROWSER_ENV>2627<BROWSER_CONTROL>28- Before acting, take a screenshot to observe state.29- When typing into any input:30* Clear with Ctrl/⌘+A, then Delete.31* After submitting (Enter or clicking a button), call wait(1–2s) once, then take a single screenshot and move the mouse aside.32* Do not press Enter repeatedly. If the page state doesn't change after submit+wait+screenshot, change strategy (e.g., focus address bar with Ctrl/⌘+L, type the full URL, press Enter once).33- Computer calls are slow; batch related actions together.34- Zoom out or scroll so all relevant content is visible before reading.35- If the first screenshot is black, click near center and screenshot again.36</BROWSER_CONTROL>3738<TASK_EXECUTION>39- You receive exactly one natural-language task and no further user feedback.40- Do not ask clarifying questions; make reasonable assumptions and proceed.41- Prefer minimal, high-signal actions that move directly toward the goal.42- Every assistant turn must include at least one computer action; avoid text-only turns.43- Avoid repetition: never repeat the same action sequence in consecutive turns (e.g., pressing Enter multiple times). If an action has no visible effect, pivot to a different approach.44- If two iterations produce no meaningful progress, try a different tactic (e.g., Ctrl/⌘+L → type URL → Enter) rather than repeating the prior keys, then proceed.45- Keep the final response concise and focused on fulfilling the task.46</TASK_EXECUTION>"""474849def create_response(**kwargs):50url = "https://api.openai.com/v1/responses"51headers = {52"Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}",53"Content-Type": "application/json",54}55openai_org = os.getenv("OPENAI_ORG")56if openai_org:57headers["Openai-Organization"] = openai_org5859response = requests.post(url, headers=headers, json=kwargs)60if response.status_code != 200:61raise RuntimeError(f"OpenAI API Error: {response.status_code} {response.text}")62return response.json()
Step 2: Create the Agent Class
1import json2from typing import Any, Dict, List, Optional, Tuple34from helpers import (5STEEL_API_KEY,6BROWSER_SYSTEM_PROMPT,7create_response,8)9from steel import Steel101112class Agent:13def __init__(self):14self.steel = Steel(steel_api_key=STEEL_API_KEY)15self.session = None16self.model = "computer-use-preview"17self.viewport_width = 128018self.viewport_height = 76819self.system_prompt = BROWSER_SYSTEM_PROMPT20self.tools = [21{22"type": "computer-preview",23"display_width": self.viewport_width,24"display_height": self.viewport_height,25"environment": "browser",26}27]28self.print_steps = True29self.auto_acknowledge_safety = True3031def center(self) -> Tuple[int, int]:32return (self.viewport_width // 2, self.viewport_height // 2)3334def to_number(self, v: Any, default: float = 0.0) -> float:35if isinstance(v, (int, float)):36return float(v)37if isinstance(v, str):38try:39return float(v)40except ValueError:41return default42return default4344def to_coords(self, x: Any = None, y: Any = None) -> Tuple[int, int]:45if x is None or y is None:46return self.center()47return (48int(self.to_number(x, self.center()[0])),49int(self.to_number(y, self.center()[1])),50)5152def split_keys(self, k: Optional[Any]) -> List[str]:53if isinstance(k, list):54return [str(s) for s in k if s]55if isinstance(k, str) and k.strip():56return [s.strip() for s in k.split("+") if s.strip()]57return []5859def normalize_key(self, key: str) -> str:60if not isinstance(key, str) or not key:61return key62k = key.strip()63upper = k.upper()64synonyms = {65"ENTER": "Enter",66"RETURN": "Enter",67"ESC": "Escape",68"ESCAPE": "Escape",69"TAB": "Tab",70"BACKSPACE": "Backspace",71"DELETE": "Delete",72"SPACE": "Space",73"CTRL": "Control",74"CONTROL": "Control",75"ALT": "Alt",76"SHIFT": "Shift",77"META": "Meta",78"CMD": "Meta",79"UP": "ArrowUp",80"DOWN": "ArrowDown",81"LEFT": "ArrowLeft",82"RIGHT": "ArrowRight",83"HOME": "Home",84"END": "End",85"PAGEUP": "PageUp",86"PAGEDOWN": "PageDown",87}88if upper in synonyms:89return synonyms[upper]90if upper.startswith("F") and upper[1:].isdigit():91return "F" + upper[1:]92return k9394def normalize_keys(self, keys: List[str]) -> List[str]:95return [self.normalize_key(k) for k in keys]9697def initialize(self) -> None:98width = self.viewport_width99height = self.viewport_height100self.session = self.steel.sessions.create(101dimensions={"width": width, "height": height},102block_ads=True,103api_timeout=900000,104)105print("Steel Session created successfully!")106print(f"View live session at: {self.session.session_viewer_url}")107108def cleanup(self) -> None:109if self.session:110print("Releasing Steel session...")111self.steel.sessions.release(self.session.id)112print(113f"Session completed. View replay at {self.session.session_viewer_url}"114)115self.session = None116117def take_screenshot(self) -> str:118resp = self.steel.sessions.computer(self.session.id, action="take_screenshot")119img = getattr(resp, "base64_image", None)120if not img:121raise RuntimeError("No screenshot returned from Steel")122return img123124def map_button(self, btn: Optional[str]) -> str:125b = (btn or "left").lower()126if b in ("left", "right", "middle", "back", "forward"):127return b128return "left"129130def execute_computer_action(131self, action_type: str, action_args: Dict[str, Any]132) -> str:133body: Dict[str, Any]134135if action_type == "move":136coords = self.to_coords(action_args.get("x"), action_args.get("y"))137body = {138"action": "move_mouse",139"coordinates": [coords[0], coords[1]],140"screenshot": True,141}142143elif action_type in ("click",):144coords = self.to_coords(action_args.get("x"), action_args.get("y"))145button = self.map_button(action_args.get("button"))146num_clicks = int(self.to_number(action_args.get("num_clicks"), 1))147payload = {148"action": "click_mouse",149"button": button,150"coordinates": [coords[0], coords[1]],151"screenshot": True,152}153if num_clicks > 1:154payload["num_clicks"] = num_clicks155body = payload156157elif action_type in ("doubleClick", "double_click"):158coords = self.to_coords(action_args.get("x"), action_args.get("y"))159body = {160"action": "click_mouse",161"button": "left",162"coordinates": [coords[0], coords[1]],163"num_clicks": 2,164"screenshot": True,165}166167elif action_type == "drag":168path = action_args.get("path") or []169steel_path: List[List[int]] = []170for p in path:171steel_path.append(list(self.to_coords(p.get("x"), p.get("y"))))172if len(steel_path) < 2:173cx, cy = self.center()174tx, ty = self.to_coords(action_args.get("x"), action_args.get("y"))175steel_path = [[cx, cy], [tx, ty]]176body = {"action": "drag_mouse", "path": steel_path, "screenshot": True}177178elif action_type == "scroll":179coords: Optional[Tuple[int, int]] = None180if action_args.get("x") is not None or action_args.get("y") is not None:181coords = self.to_coords(action_args.get("x"), action_args.get("y"))182delta_x = int(self.to_number(action_args.get("scroll_x"), 0))183delta_y = int(self.to_number(action_args.get("scroll_y"), 0))184body = {185"action": "scroll",186"screenshot": True,187}188if coords:189body["coordinates"] = [coords[0], coords[1]]190if delta_x:191body["delta_x"] = delta_x192if delta_y:193body["delta_y"] = delta_y194195elif action_type == "type":196text = action_args.get("text") or ""197body = {"action": "type_text", "text": text, "screenshot": True}198199elif action_type == "keypress":200keys = action_args.get("keys")201keys_list = self.split_keys(keys)202normalized = self.normalize_keys(keys_list)203body = {"action": "press_key", "keys": normalized, "screenshot": True}204205elif action_type == "wait":206ms = self.to_number(action_args.get("ms"), 1000)207seconds = max(0.001, ms / 1000.0)208body = {"action": "wait", "duration": seconds, "screenshot": True}209210elif action_type == "screenshot":211return self.take_screenshot()212213else:214return self.take_screenshot()215216resp = self.steel.sessions.computer(217self.session.id, **{k: v for k, v in body.items() if v is not None}218)219img = getattr(resp, "base64_image", None)220return img if img else self.take_screenshot()221222def handle_item(self, item: Dict[str, Any]) -> List[Dict[str, Any]]:223if item["type"] == "message":224if self.print_steps and item.get("content") and len(item["content"]) > 0:225print(item["content"][0].get("text", ""))226return []227228if item["type"] == "function_call":229if self.print_steps:230print(f"{item['name']}({item['arguments']})")231return [232{233"type": "function_call_output",234"call_id": item["call_id"],235"output": "success",236}237]238239if item["type"] == "computer_call":240action = item["action"]241action_type = action["type"]242action_args = {k: v for k, v in action.items() if k != "type"}243244if self.print_steps:245print(f"{action_type}({json.dumps(action_args)})")246247screenshot_base64 = self.execute_computer_action(action_type, action_args)248249pending_checks = item.get("pending_safety_checks", []) or []250for check in pending_checks:251if self.auto_acknowledge_safety:252print(f"⚠️ Auto-acknowledging safety check: {check.get('message')}")253else:254raise RuntimeError(f"Safety check failed: {check.get('message')}")255256call_output = {257"type": "computer_call_output",258"call_id": item["call_id"],259"acknowledged_safety_checks": pending_checks,260"output": {261"type": "input_image",262"image_url": f"data:image/png;base64,{screenshot_base64}",263},264}265return [call_output]266267return []268269def execute_task(270self,271task: str,272print_steps: bool = True,273debug: bool = False,274max_iterations: int = 50,275) -> str:276self.print_steps = print_steps277278input_items: List[Dict[str, Any]] = [279{"role": "system", "content": self.system_prompt},280{"role": "user", "content": task},281]282283new_items: List[Dict[str, Any]] = []284iterations = 0285consecutive_no_actions = 0286last_assistant_texts: List[str] = []287288print(f"🎯 Executing task: {task}")289print("=" * 60)290291def detect_repetition(text: str) -> bool:292if len(last_assistant_texts) < 2:293return False294words1 = text.lower().split()295for prev in last_assistant_texts:296words2 = prev.lower().split()297common = [w for w in words1 if w in words2]298if len(common) / max(len(words1), len(words2)) > 0.8:299return True300return False301302while iterations < max_iterations:303iterations += 1304has_actions = False305306if new_items and new_items[-1].get("role") == "assistant":307content = new_items[-1].get("content", [])308last_text = content[0].get("text") if content else None309if isinstance(last_text, str) and last_text:310if detect_repetition(last_text):311print("🔄 Repetition detected - stopping execution")312last_assistant_texts.append(last_text)313break314last_assistant_texts.append(last_text)315if len(last_assistant_texts) > 3:316last_assistant_texts.pop(0)317318try:319response = create_response(320model=self.model,321input=[*input_items, *new_items],322tools=self.tools,323truncation="auto",324)325326if "output" not in response:327raise RuntimeError("No output from model")328329for item in response["output"]:330new_items.append(item)331if item.get("type") in ("computer_call", "function_call"):332has_actions = True333new_items.extend(self.handle_item(item))334335if not has_actions:336consecutive_no_actions += 1337if consecutive_no_actions >= 3:338print("⚠️ No actions for 3 consecutive iterations - stopping")339break340else:341consecutive_no_actions = 0342343except Exception as error:344print(f"❌ Error during task execution: {error}")345raise346347if iterations >= max_iterations:348print(f"⚠️ Task execution stopped after {max_iterations} iterations")349350assistant_messages = [i for i in new_items if i.get("role") == "assistant"]351if assistant_messages:352content = assistant_messages[-1].get("content") or []353if content and content[0].get("text"):354return content[0]["text"]355356return "Task execution completed (no final message)"
Step 3: Create the Main Script
1import sys2import time34from helpers import STEEL_API_KEY, OPENAI_API_KEY, TASK5from agent import Agent678def main():9print("🚀 Steel + OpenAI Computer Use Assistant")10print("=" * 60)1112if STEEL_API_KEY == "your-steel-api-key-here":13print(14"⚠️ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"15)16print(" Get your API key at: https://app.steel.dev/settings/api-keys")17sys.exit(1)1819if OPENAI_API_KEY == "your-openai-api-key-here":20print(21"⚠️ WARNING: Please replace 'your-openai-api-key-here' with your actual OpenAI API key"22)23print(" Get your API key at: https://platform.openai.com/")24sys.exit(1)2526print("\nStarting Steel session...")27agent = Agent()2829try:30agent.initialize()31print("✅ Steel session started!")3233start_time = time.time()3435try:36result = agent.execute_task(TASK, True, False, 50)37duration = f"{(time.time() - start_time):.1f}"3839print("\n" + "=" * 60)40print("🎉 TASK EXECUTION COMPLETED")41print("=" * 60)42print(f"⏱️ Duration: {duration} seconds")43print(f"🎯 Task: {TASK}")44print(f"📋 Result:\n{result}")45print("=" * 60)4647except Exception as e:48print(f"❌ Task execution failed: {e}")49raise5051except Exception as e:52print(f"❌ Failed to start Steel session: {e}")53print("Please check your STEEL_API_KEY and internet connection.")54raise5556finally:57agent.cleanup()585960if __name__ == "__main__":61main()
Running Your Agent
Execute your script to start an interactive AI browser session:
python main.py
You will see the session URL printed in the console. You can view the live browser session by opening this URL in your web browser.
The agent will execute the task defined in the TASK environment variable or the default task. You can modify the task by setting the environment variable:
export TASK="Search for the latest news on artificial intelligence"python main.py
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the OpenAI documentation for more information about the computer-use-preview model
-
Add additional features like session recording or multi-session management