Quickstart (Typescript)
How to use OpenAI Computer Use with Steel
This guide will walk you through how to use OpenAI's computer-use-preview
model with Steel's managed remote browsers to create AI agents that can navigate the web.
We’ll be implementing a simple CUA loop that functions as described below:
Prerequisites
-
Node.js 20+
-
A Steel API key (sign up here)
-
An OpenAI API key with access to the
computer-use-preview
model
Step 1: Setup and Helper Functions
1import { chromium } from "playwright";2import type { Browser, Page } from "playwright";3import { Steel } from "steel-sdk";4import * as dotenv from "dotenv";56dotenv.config();78// Replace with your own API keys9export const STEEL_API_KEY =10process.env.STEEL_API_KEY || "your-steel-api-key-here";11export const OPENAI_API_KEY =12process.env.OPENAI_API_KEY || "your-openai-api-key-here";1314// Replace with your own task15export const TASK =16process.env.TASK || "Go to Wikipedia and search for machine learning";1718export const SYSTEM_PROMPT = `You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.1920<CAPABILITIES>21* You control a Chrome browser tab and can navigate to any website22* You can click, type, scroll, take screenshots, and interact with web elements23* You have full internet access and can visit any public website24* You can read content, fill forms, search for information, and perform complex multi-step tasks25* After each action, you receive a screenshot showing the current state26* Use the goto(url) function to navigate directly to URLs - DO NOT try to click address bars or browser UI27* Use the back() function to go back to the previous page2829<COORDINATE_SYSTEM>30* The browser viewport has specific dimensions that you must respect31* All coordinates (x, y) must be within the viewport bounds32* X coordinates must be between 0 and the display width (inclusive)33* Y coordinates must be between 0 and the display height (inclusive)34* Always ensure your click, move, scroll, and drag coordinates are within these bounds35* If you're unsure about element locations, take a screenshot first to see the current state3637<AUTONOMOUS_EXECUTION>38* Work completely independently - make decisions and act immediately without asking questions39* Never request clarification, present options, or ask for permission40* Make intelligent assumptions based on task context41* If something is ambiguous, choose the most logical interpretation and proceed42* Take immediate action rather than explaining what you might do43* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions4445<REASONING_STRUCTURE>46For each step, you must reason systematically:47* Analyze your previous action's success/failure and current state48* Identify what specific progress has been made toward the goal49* Determine the next immediate objective and how to achieve it50* Choose the most efficient action sequence to make progress5152<EFFICIENCY_PRINCIPLES>53* Combine related actions when possible rather than single-step execution54* Navigate directly to relevant websites without unnecessary exploration55* Use screenshots strategically to understand page state before acting56* Be persistent with alternative approaches if initial attempts fail57* Focus on the specific information or outcome requested5859<COMPLETION_CRITERIA>60* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"61* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"62* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"63* Do not write anything after completing the task except the required completion message64* Do not ask questions, provide commentary, or offer additional help after task completion65* The completion message is the end of the interaction - nothing else should follow6667<CRITICAL_REQUIREMENTS>68* This is fully automated execution - work completely independently69* Start by taking a screenshot to understand the current state70* Use goto(url) function for navigation - never click on browser UI elements71* Always respect coordinate boundaries - invalid coordinates will fail72* Recognize when the stated objective has been achieved and declare completion immediately73* Focus on the explicit task given, not implied or potential follow-up tasks7475Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results.`;7677export const BLOCKED_DOMAINS = [78"maliciousbook.com",79"evilvideos.com",80"darkwebforum.com",81"shadytok.com",82"suspiciouspins.com",83"ilanbigio.com",84];8586export const CUA_KEY_TO_PLAYWRIGHT_KEY: Record<string, string> = {87"/": "Divide",88"\\": "Backslash",89alt: "Alt",90arrowdown: "ArrowDown",91arrowleft: "ArrowLeft",92arrowright: "ArrowRight",93arrowup: "ArrowUp",94backspace: "Backspace",95capslock: "CapsLock",96cmd: "Meta",97ctrl: "Control",98delete: "Delete",99end: "End",100enter: "Enter",101esc: "Escape",102home: "Home",103insert: "Insert",104option: "Alt",105pagedown: "PageDown",106pageup: "PageUp",107shift: "Shift",108space: " ",109super: "Meta",110tab: "Tab",111win: "Meta",112};113114export interface MessageItem {115type: "message";116content: Array<{ text: string }>;117}118119export interface FunctionCallItem {120type: "function_call";121call_id: string;122name: string;123arguments: string;124}125126export interface ComputerCallItem {127type: "computer_call";128call_id: string;129action: {130type: string;131[key: string]: any;132};133pending_safety_checks?: Array<{134id: string;135message: string;136}>;137}138139export interface OutputItem {140type: "computer_call_output" | "function_call_output";141call_id: string;142acknowledged_safety_checks?: Array<{143id: string;144message: string;145}>;146output?:147| {148type: string;149image_url?: string;150current_url?: string;151}152| string;153}154155export interface ResponseItem {156id: string;157output: (MessageItem | FunctionCallItem | ComputerCallItem)158[];159}160161export function pp(obj: any): void {162console.log(JSON.stringify(obj, null, 2));163}164165export function sanitizeMessage(msg: any): any {166if (msg?.type === "computer_call_output") {167const output = msg.output || {};168if (typeof output === "object") {169return {170...msg,171output: { ...output, image_url: "[omitted]" },172};173}174}175return msg;176}177178export async function createResponse(params: any): Promise<ResponseItem> {179const url = "https://api.openai.com/v1/responses";180const headers: Record<string, string> = {181Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,182"Content-Type": "application/json",183};184185const openaiOrg = process.env.OPENAI_ORG;186if (openaiOrg) {187headers["Openai-Organization"] = openaiOrg;188}189190const response = await fetch(url, {191method: "POST",192headers,193body: JSON.stringify(params),194});195196if (!response.ok) {197const errorText = await response.text();198throw new Error(`OpenAI API Error: ${response.status} ${errorText}`);199}200201return (await response.json()) as ResponseItem;202}203204export function checkBlocklistedUrl(url: string): void {205try {206const hostname = new URL(url).hostname || "";207const isBlocked = BLOCKED_DOMAINS.some(208(blocked) => hostname === blocked || hostname.endsWith(`.${blocked}`)209);210if (isBlocked) {211throw new Error(`Blocked URL: ${url}`);212}213} catch (error) {214if (error instanceof Error && error.message.startsWith("Blocked URL:")) {215throw error;216}217}218}
Step 2: Create Steel Browser Integration
1export class SteelBrowser {2private client: Steel;3private session: any;4private browser: Browser | null = null;5private page: Page | null = null;6private dimensions: [number, number];7private proxy: boolean;8private solveCaptcha: boolean;9private virtualMouse: boolean;10private sessionTimeout: number;11private adBlocker: boolean;12private startUrl: string;1314constructor(15width: number = 1024,16height: number = 768,17proxy: boolean = false,18solveCaptcha: boolean = false,19virtualMouse: boolean = true,20sessionTimeout: number = 900000, // 15 minutes21adBlocker: boolean = true,22startUrl: string = "https://www.google.com"23) {24this.client = new Steel({25steelAPIKey: process.env.STEEL_API_KEY!,26});27this.dimensions = [width, height];28this.proxy = proxy;29this.solveCaptcha = solveCaptcha;30this.virtualMouse = virtualMouse;31this.sessionTimeout = sessionTimeout;32this.adBlocker = adBlocker;33this.startUrl = startUrl;34}3536getEnvironment(): string {37return "browser";38}3940getDimensions(): [number, number] {41return this.dimensions;42}4344getCurrentUrl(): string {45return this.page?.url() || "";46}4748async initialize(): Promise<void> {49const [width, height] = this.dimensions;50const sessionParams = {51useProxy: this.proxy,52solveCaptcha: this.solveCaptcha,53apiTimeout: this.sessionTimeout,54blockAds: this.adBlocker,55dimensions: { width, height },56};5758this.session = await this.client.sessions.create(sessionParams);59console.log("Steel Session created successfully!");60console.log(`View live session at: ${this.session.sessionViewerUrl}`);6162const cdpUrl = `${this.session.websocketUrl}&apiKey=${process.env.STEEL_API_KEY}`;6364this.browser = await chromium.connectOverCDP(cdpUrl, {65timeout: 60000,66});6768const context = this.browser.contexts()69[0];7071await context.route("**/*", async (route, request) => {72const url = request.url();73try {74checkBlocklistedUrl(url);75await route.continue();76} catch (error) {77console.log(`Blocking URL: ${url}`);78await route.abort();79}80});8182if (this.virtualMouse) {83await context.addInitScript(`84if (window.self === window.top) {85function initCursor() {86const CURSOR_ID = '__cursor__';87if (document.getElementById(CURSOR_ID)) return;8889const cursor = document.createElement('div');90cursor.id = CURSOR_ID;91Object.assign(cursor.style, {92position: 'fixed',93top: '0px',94left: '0px',95width: '20px',96height: '20px',97backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',98backgroundSize: 'cover',99pointerEvents: 'none',100zIndex: '99999',101transform: 'translate(-2px, -2px)',102});103104document.body.appendChild(cursor);105106document.addEventListener("mousemove", (e) => {107cursor.style.top = e.clientY + "px";108cursor.style.left = e.clientX + "px";109});110}111112function checkBody() {113if (document.body) {114initCursor();115} else {116requestAnimationFrame(checkBody);117}118}119requestAnimationFrame(checkBody);120}121`);122}123124this.page = context.pages()[0];125126// Explicitly set viewport size to ensure it matches our expected dimensions127await this.page.setViewportSize({128width: width,129height: height,130});131132await this.page.goto(this.startUrl);133}134135async cleanup(): Promise<void> {136if (this.page) {137await this.page.close();138}139if (this.browser) {140await this.browser.close();141}142if (this.session) {143console.log("Releasing Steel session...");144await this.client.sessions.release(this.session.id);145console.log(146`Session completed. View replay at ${this.session.sessionViewerUrl}`147);148}149}150151async screenshot(): Promise<string> {152if (!this.page) throw new Error("Page not initialized");153154try {155// Use regular Playwright screenshot for consistent viewport sizing156const buffer = await this.page.screenshot({157fullPage: false,158clip: {159x: 0,160y: 0,161width: this.dimensions[0],162height: this.dimensions[1],163},164});165return buffer.toString("base64");166} catch (error) {167console.log(`Screenshot failed: ${error}`);168// Fallback to CDP screenshot without fromSurface169try {170const cdpSession = await this.page.context().newCDPSession(this.page);171const result = await cdpSession.send("Page.captureScreenshot", {172format: "png",173fromSurface: false,174});175return result.data;176} catch (cdpError) {177console.log(`CDP screenshot also failed: ${cdpError}`);178throw error;179}180}181}182183async click(x: number, y: number, button: string = "left"): Promise<void> {184if (!this.page) throw new Error("Page not initialized");185186if (button === "back") {187await this.back();188} else if (button === "forward") {189await this.forward();190} else if (button === "wheel") {191await this.page.mouse.wheel(x, y);192} else {193const buttonType = { left: "left", right: "right" }[button] || "left";194await this.page.mouse.click(x, y, {195button: buttonType as any,196});197}198}199200async doubleClick(x: number, y: number): Promise<void> {201if (!this.page) throw new Error("Page not initialized");202await this.page.mouse.dblclick(x, y);203}204205async scroll(206x: number,207y: number,208scroll_x: number,209scroll_y: number210): Promise<void> {211if (!this.page) throw new Error("Page not initialized");212await this.page.mouse.move(x, y);213await this.page.evaluate(214({ scrollX, scrollY }) => {215window.scrollBy(scrollX, scrollY);216},217{ scrollX: scroll_x, scrollY: scroll_y }218);219}220221async type(text: string): Promise<void> {222if (!this.page) throw new Error("Page not initialized");223await this.page.keyboard.type(text);224}225226async wait(ms: number = 1000): Promise<void> {227await new Promise((resolve) => setTimeout(resolve, ms));228}229230async move(x: number, y: number): Promise<void> {231if (!this.page) throw new Error("Page not initialized");232await this.page.mouse.move(x, y);233}234235async keypress(keys: string[]): Promise<void> {236if (!this.page) throw new Error("Page not initialized");237238const mappedKeys = keys.map(239(key) => CUA_KEY_TO_PLAYWRIGHT_KEY[key.toLowerCase()] || key240);241242for (const key of mappedKeys) {243await this.page.keyboard.down(key);244}245246for (const key of mappedKeys.reverse()) {247await this.page.keyboard.up(key);248}249}250251async drag(path: Array<{ x: number; y: number }>): Promise<void> {252if (!this.page) throw new Error("Page not initialized");253if (path.length === 0) return;254255await this.page.mouse.move(path[0].x, path[0].y);256await this.page.mouse.down();257258for (const point of path.slice(1)) {259await this.page.mouse.move(point.x, point.y);260}261262await this.page.mouse.up();263}264265async goto(url: string): Promise<void> {266if (!this.page) throw new Error("Page not initialized");267try {268await this.page.goto(url);269} catch (error) {270console.log(`Error navigating to ${url}: ${error}`);271}272}273274async back(): Promise<void> {275if (!this.page) throw new Error("Page not initialized");276await this.page.goBack();277}278279async forward(): Promise<void> {280if (!this.page) throw new Error("Page not initialized");281await this.page.goForward();282}283284async getViewportInfo(): Promise<any> {285/**Get detailed viewport information for debugging.*/286if (!this.page) {287return {};288}289290try {291return await this.page.evaluate(() => ({292innerWidth: window.innerWidth,293innerHeight: window.innerHeight,294devicePixelRatio: window.devicePixelRatio,295screenWidth: window.screen.width,296screenHeight: window.screen.height,297scrollX: window.scrollX,298scrollY: window.scrollY,299}));300} catch {301return {};302}303}304}
Step 3: Create the Agent Class
1export class Agent {2private model: string;3private computer: SteelBrowser;4private tools: any[];5private autoAcknowledgeSafety: boolean;6private printSteps: boolean = true;7private debug: boolean = false;8private showImages: boolean = false;9private viewportWidth: number;10private viewportHeight: number;11private systemPrompt: string;1213constructor(14model: string = "computer-use-preview",15computer: SteelBrowser,16tools: any[] = [],17autoAcknowledgeSafety: boolean = true18) {19this.model = model;20this.computer = computer;21this.tools = tools;22this.autoAcknowledgeSafety = autoAcknowledgeSafety;2324const [width, height] = computer.getDimensions();25this.viewportWidth = width;26this.viewportHeight = height;2728// Create dynamic system prompt with viewport dimensions29this.systemPrompt = SYSTEM_PROMPT.replace(30"<COORDINATE_SYSTEM>",31`<COORDINATE_SYSTEM>32* The browser viewport dimensions are ${width}x${height} pixels33* The browser viewport has specific dimensions that you must respect`34);3536this.tools.push({37type: "computer-preview",38display_width: width,39display_height: height,40environment: computer.getEnvironment(),41});4243// Add goto function tool for direct URL navigation44this.tools.push({45type: "function",46name: "goto",47description: "Navigate directly to a specific URL.",48parameters: {49type: "object",50properties: {51url: {52type: "string",53description:54"Fully qualified URL to navigate to (e.g., https://example.com).",55},56},57additionalProperties: false,58required: ["url"],59},60});6162// Add back function tool for browser navigation63this.tools.push({64type: "function",65name: "back",66description: "Go back to the previous page.",67parameters: {},68});69}7071debugPrint(...args: any[]): void {72if (this.debug) {73pp(args);74}75}7677private async getViewportInfo(): Promise<any> {78/**Get detailed viewport information for debugging.*/79return await this.computer.getViewportInfo();80}8182private async validateScreenshotDimensions(83screenshotBase64: string84): Promise<any> {85/**Validate screenshot dimensions against viewport.*/86try {87// Decode base64 and get image dimensions88const buffer = Buffer.from(screenshotBase64, "base64");8990// Simple way to get dimensions from PNG buffer91// PNG width is at bytes 16-19, height at bytes 20-2392const width = buffer.readUInt32BE(16);93const height = buffer.readUInt32BE(20);9495const viewportInfo = await this.getViewportInfo();9697const scalingInfo = {98screenshot_size: [width, height],99viewport_size: [this.viewportWidth, this.viewportHeight],100actual_viewport: [101viewportInfo.innerWidth || 0,102viewportInfo.innerHeight || 0,103],104device_pixel_ratio: viewportInfo.devicePixelRatio || 1.0,105width_scale: this.viewportWidth > 0 ? width / this.viewportWidth : 1.0,106height_scale:107this.viewportHeight > 0 ? height / this.viewportHeight : 1.0,108};109110// Warn about scaling mismatches111if (scalingInfo.width_scale !== 1.0 || scalingInfo.height_scale !== 1.0) {112console.log(`⚠️ Screenshot scaling detected:`);113console.log(` Screenshot: ${width}x${height}`);114console.log(115` Expected viewport: ${this.viewportWidth}x${this.viewportHeight}`116);117console.log(118` Actual viewport: ${viewportInfo.innerWidth || "unknown"}x${119viewportInfo.innerHeight || "unknown"120}`121);122console.log(123` Scale factors: ${scalingInfo.width_scale.toFixed(1243125)}x${scalingInfo.height_scale.toFixed(3)}`126);127}128129return scalingInfo;130} catch (error) {131console.log(`⚠️ Error validating screenshot dimensions: ${error}`);132return {};133}134}135136private validateCoordinates(actionArgs: any): any {137const validatedArgs = { ...actionArgs };138139// Handle single coordinates (click, move, etc.)140if ("x" in actionArgs && "y" in actionArgs) {141validatedArgs.x = this.toNumber(actionArgs.x);142validatedArgs.y = this.toNumber(actionArgs.y);143}144145// Handle path arrays (drag)146if ("path" in actionArgs && Array.isArray(actionArgs.path)) {147validatedArgs.path = actionArgs.path.map((point: any) => ({148x: this.toNumber(point.x),149y: this.toNumber(point.y),150}));151}152153return validatedArgs;154}155156private toNumber(value: any): number {157if (typeof value === "string") {158const num = parseFloat(value);159return isNaN(num) ? 0 : num;160}161return typeof value === "number" ? value : 0;162}163164async executeAction(actionType: string, actionArgs: any): Promise<void> {165const validatedArgs = this.validateCoordinates(actionArgs);166167switch (actionType) {168case "click":169await this.computer.click(170validatedArgs.x,171validatedArgs.y,172validatedArgs.button || "left"173);174break;175case "doubleClick":176case "double_click":177await this.computer.doubleClick(validatedArgs.x, validatedArgs.y);178break;179case "move":180await this.computer.move(validatedArgs.x, validatedArgs.y);181break;182case "scroll":183await this.computer.scroll(184validatedArgs.x,185validatedArgs.y,186this.toNumber(validatedArgs.scroll_x),187this.toNumber(validatedArgs.scroll_y)188);189break;190case "drag":191const path = validatedArgs.path || [];192await this.computer.drag(path);193break;194case "type":195await this.computer.type(validatedArgs.text || "");196break;197case "keypress":198await this.computer.keypress(validatedArgs.keys || []);199break;200case "wait":201await this.computer.wait(this.toNumber(validatedArgs.ms) || 1000);202break;203case "goto":204await this.computer.goto(validatedArgs.url || "");205break;206case "back":207await this.computer.back();208break;209case "forward":210await this.computer.forward();211break;212case "screenshot":213break;214default:215const method = (this.computer as any)216[actionType];217if (typeof method === "function") {218await method.call(this.computer, ...Object.values(validatedArgs));219}220break;221}222}223224async handleItem(225item: MessageItem | FunctionCallItem | ComputerCallItem226): Promise<OutputItem[]> {227if (item.type === "message") {228if (this.printSteps) {229console.log(item.content[0].text);230}231} else if (item.type === "function_call") {232const { name, arguments: argsStr } = item;233const args = JSON.parse(argsStr);234235if (this.printSteps) {236console.log(`${name}(${JSON.stringify(args)})`);237}238239if (typeof (this.computer as any)240[name] === "function") {241const method = (this.computer as any)242[name];243await method.call(this.computer, ...Object.values(args));244}245246return [247{248type: "function_call_output",249call_id: item.call_id,250output: "success",251},252];253} else if (item.type === "computer_call") {254const { action } = item;255const actionType = action.type;256const { type, ...actionArgs } = action;257258if (this.printSteps) {259console.log(`${actionType}(${JSON.stringify(actionArgs)})`);260}261262await this.executeAction(actionType, actionArgs);263const screenshotBase64 = await this.computer.screenshot();264265// Validate screenshot dimensions for debugging266await this.validateScreenshotDimensions(screenshotBase64);267268const pendingChecks = item.pending_safety_checks || [];269for (const check of pendingChecks) {270if (this.autoAcknowledgeSafety) {271console.log(`⚠️ Auto-acknowledging safety check: ${check.message}`);272} else {273throw new Error(`Safety check failed: ${check.message}`);274}275}276277const callOutput: OutputItem = {278type: "computer_call_output",279call_id: item.call_id,280acknowledged_safety_checks: pendingChecks,281output: {282type: "input_image",283image_url: `data:image/png;base64,${screenshotBase64}`,284},285};286287if (this.computer.getEnvironment() === "browser") {288const currentUrl = this.computer.getCurrentUrl();289checkBlocklistedUrl(currentUrl);290(callOutput.output as any).current_url = currentUrl;291}292293return [callOutput];294}295296return [];297}298299async executeTask(300task: string,301printSteps: boolean = true,302debug: boolean = false,303maxIterations: number = 50304): Promise<string> {305this.printSteps = printSteps;306this.debug = debug;307this.showImages = false;308309const inputItems = [310{311role: "system",312content: this.systemPrompt,313},314{315role: "user",316content: task,317},318];319320let newItems: any[] = [];321let iterations = 0;322let consecutiveNoActions = 0;323let lastAssistantMessages: string[] = [];324325console.log(`🎯 Executing task: ${task}`);326console.log("=".repeat(60));327328const isTaskComplete = (329content: string330): { completed: boolean; reason?: string } => {331const lowerContent = content.toLowerCase();332333if (content.includes("TASK_COMPLETED:")) {334return { completed: true, reason: "explicit_completion" };335}336if (337content.includes("TASK_FAILED:") ||338content.includes("TASK_ABANDONED:")339) {340return { completed: true, reason: "explicit_failure" };341}342343const completionPatterns = [344/task\s+(completed|finished|done|accomplished)/i,345/successfully\s+(completed|finished|found|gathered)/i,346/here\s+(is|are)\s+the\s+(results?|information|summary)/i,347/to\s+summarize/i,348/in\s+conclusion/i,349/final\s+(answer|result|summary)/i,350];351352const failurePatterns = [353/cannot\s+(complete|proceed|access|continue)/i,354/unable\s+to\s+(complete|access|find|proceed)/i,355/blocked\s+by\s+(captcha|security|authentication)/i,356/giving\s+up/i,357/no\s+longer\s+able/i,358/have\s+tried\s+multiple\s+approaches/i,359];360361if (completionPatterns.some((pattern) => pattern.test(content))) {362return { completed: true, reason: "natural_completion" };363}364365if (failurePatterns.some((pattern) => pattern.test(content))) {366return { completed: true, reason: "natural_failure" };367}368369return { completed: false };370};371372const detectRepetition = (newMessage: string): boolean => {373if (lastAssistantMessages.length < 2) return false;374375const similarity = (str1: string, str2: string): number => {376const words1 = str1.toLowerCase().split(/\s+/);377const words2 = str2.toLowerCase().split(/\s+/);378const commonWords = words1.filter((word) => words2.includes(word));379return commonWords.length / Math.max(words1.length, words2.length);380};381382return lastAssistantMessages.some(383(prevMessage) => similarity(newMessage, prevMessage) > 0.8384);385};386387while (iterations < maxIterations) {388iterations++;389let hasActions = false;390391if (392newItems.length > 0 &&393newItems[newItems.length - 1]?.role === "assistant"394) {395const lastMessage = newItems[newItems.length - 1];396if (lastMessage.content?.[0]?.text) {397const content = lastMessage.content[0].text;398399const completion = isTaskComplete(content);400if (completion.completed) {401console.log(`✅ Task completed (${completion.reason})`);402break;403}404405if (detectRepetition(content)) {406console.log("🔄 Repetition detected - stopping execution");407lastAssistantMessages.push(content);408break;409}410411lastAssistantMessages.push(content);412if (lastAssistantMessages.length > 3) {413lastAssistantMessages.shift(); // Keep only last 3414}415}416}417418this.debugPrint([...inputItems, ...newItems].map(sanitizeMessage));419420try {421const response = await createResponse({422model: this.model,423input: [...inputItems, ...newItems],424tools: this.tools,425truncation: "auto",426});427428this.debugPrint(response);429430if (!response.output) {431if (this.debug) {432console.log(response);433}434throw new Error("No output from model");435}436437newItems.push(...response.output);438439for (const item of response.output) {440if (item.type === "computer_call" || item.type === "function_call") {441hasActions = true;442}443const handleResult = await this.handleItem(item);444newItems.push(...handleResult);445}446447if (!hasActions) {448consecutiveNoActions++;449if (consecutiveNoActions >= 3) {450console.log(451"⚠️ No actions for 3 consecutive iterations - stopping"452);453break;454}455} else {456consecutiveNoActions = 0;457}458} catch (error) {459console.error(`❌ Error during task execution: ${error}`);460throw error;461}462}463464if (iterations >= maxIterations) {465console.warn(466`⚠️ Task execution stopped after ${maxIterations} iterations`467);468}469470const assistantMessages = newItems.filter(471(item) => item.role === "assistant"472);473const finalMessage = assistantMessages[assistantMessages.length - 1];474475return (476finalMessage?.content?.[0]?.text ||477"Task execution completed (no final message)"478);479}480}
Step 4: Create the Main Script
1import { SteelBrowser } from "./steelBrowser";2import { Agent } from "./agent";3import { STEEL_API_KEY, OPENAI_API_KEY, TASK } from "./helpers";45async function main(): Promise<void> {6console.log("🚀 Steel + OpenAI Computer Use Assistant");7console.log("=".repeat(60));89if (STEEL_API_KEY === "your-steel-api-key-here") {10console.warn(11"⚠️ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"12);13console.warn(14" Get your API key at: https://app.steel.dev/settings/api-keys"15);16return;17}1819if (OPENAI_API_KEY === "your-openai-api-key-here") {20console.warn(21"⚠️ WARNING: Please replace 'your-openai-api-key-here' with your actual OpenAI API key"22);23console.warn(" Get your API key at: https://platform.openai.com/");24return;25}2627console.log("\nStarting Steel browser session...");2829const computer = new SteelBrowser();3031try {32await computer.initialize();33console.log("✅ Steel browser session started!");3435const agent = new Agent("computer-use-preview", computer, [], true);3637const startTime = Date.now();3839try {40const result = await agent.executeTask(TASK, true, false, 50);4142const duration = ((Date.now() - startTime) / 1000).toFixed(1);4344console.log("\n" + "=".repeat(60));45console.log("🎉 TASK EXECUTION COMPLETED");46console.log("=".repeat(60));47console.log(`⏱️ Duration: ${duration} seconds`);48console.log(`🎯 Task: ${TASK}`);49console.log(`📋 Result:\n${result}`);50console.log("=".repeat(60));51} catch (error) {52console.error(`❌ Task execution failed: ${error}`);53process.exit(1);54}55} catch (error) {56console.log(`❌ Failed to start Steel browser: ${error}`);57console.log("Please check your STEEL_API_KEY and internet connection.");58process.exit(1);59} finally {60await computer.cleanup();61}62}6364main().catch(console.error);
Running Your Agent
Execute your script to start an interactive AI browser session:
The agent will execute the task defined in the TASK
environment variable or the default task. You can modify the task by setting the environment variable:
export TASK="Research the top 5 electric vehicles with the longest range"npm start
You'll see each action the agent takes displayed in the console, and you can view the live browser session by opening the session URL in your web browser.
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the OpenAI documentation for more information about the computer-use-preview model
-
Add additional features like session recording or multi-session management
-
Add additional features like session recording or multi-session management