Quickstart (Typescript)

This guide will walk you through how to use OpenAI's computer-use-previewmodel with Steel's managed remote browsers to create AI agents that can navigate the web.

We’ll be implementing a simple CUA loop that functions as described below:

Computer use - OpenAI API

Prerequisites

Node.js 20+
A Steel API key (sign up here)
An OpenAI API key with access to the computer-use-preview model

Step 1: Setup and Helper Functions

Typescript

helpers.ts

1import { chromium } from "playwright";
2import type { Browser, Page } from "playwright";
3import { Steel } from "steel-sdk";
4import * as dotenv from "dotenv";
5
6dotenv.config();
7
8// Replace with your own API keys
9export const STEEL_API_KEY =
10  process.env.STEEL_API_KEY || "your-steel-api-key-here";
11export const OPENAI_API_KEY =
12  process.env.OPENAI_API_KEY || "your-openai-api-key-here";
13
14// Replace with your own task
15export const TASK =
16  process.env.TASK || "Go to Wikipedia and search for machine learning";
17
18export const SYSTEM_PROMPT = `You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.
19
20<CAPABILITIES>
21* You control a Chrome browser tab and can navigate to any website
22* You can click, type, scroll, take screenshots, and interact with web elements
23* You have full internet access and can visit any public website
24* You can read content, fill forms, search for information, and perform complex multi-step tasks
25* After each action, you receive a screenshot showing the current state
26* Use the goto(url) function to navigate directly to URLs - DO NOT try to click address bars or browser UI
27* Use the back() function to go back to the previous page
28
29<COORDINATE_SYSTEM>
30* The browser viewport has specific dimensions that you must respect
31* All coordinates (x, y) must be within the viewport bounds
32* X coordinates must be between 0 and the display width (inclusive)
33* Y coordinates must be between 0 and the display height (inclusive)
34* Always ensure your click, move, scroll, and drag coordinates are within these bounds
35* If you're unsure about element locations, take a screenshot first to see the current state
36
37<AUTONOMOUS_EXECUTION>
38* Work completely independently - make decisions and act immediately without asking questions
39* Never request clarification, present options, or ask for permission
40* Make intelligent assumptions based on task context
41* If something is ambiguous, choose the most logical interpretation and proceed
42* Take immediate action rather than explaining what you might do
43* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions
44
45<REASONING_STRUCTURE>
46For each step, you must reason systematically:
47* Analyze your previous action's success/failure and current state
48* Identify what specific progress has been made toward the goal
49* Determine the next immediate objective and how to achieve it
50* Choose the most efficient action sequence to make progress
51
52<EFFICIENCY_PRINCIPLES>
53* Combine related actions when possible rather than single-step execution
54* Navigate directly to relevant websites without unnecessary exploration
55* Use screenshots strategically to understand page state before acting
56* Be persistent with alternative approaches if initial attempts fail
57* Focus on the specific information or outcome requested
58
59<COMPLETION_CRITERIA>
60* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"
61* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"
62* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"
63* Do not write anything after completing the task except the required completion message
64* Do not ask questions, provide commentary, or offer additional help after task completion
65* The completion message is the end of the interaction - nothing else should follow
66
67<CRITICAL_REQUIREMENTS>
68* This is fully automated execution - work completely independently
69* Start by taking a screenshot to understand the current state
70* Use goto(url) function for navigation - never click on browser UI elements
71* Always respect coordinate boundaries - invalid coordinates will fail
72* Recognize when the stated objective has been achieved and declare completion immediately
73* Focus on the explicit task given, not implied or potential follow-up tasks
74
75Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results.`;
76
77export const BLOCKED_DOMAINS = [
78  "maliciousbook.com",
79  "evilvideos.com",
80  "darkwebforum.com",
81  "shadytok.com",
82  "suspiciouspins.com",
83  "ilanbigio.com",
84];
85
86export const CUA_KEY_TO_PLAYWRIGHT_KEY: Record<string, string> = {
87  "/": "Divide",
88  "\\": "Backslash",
89  alt: "Alt",
90  arrowdown: "ArrowDown",
91  arrowleft: "ArrowLeft",
92  arrowright: "ArrowRight",
93  arrowup: "ArrowUp",
94  backspace: "Backspace",
95  capslock: "CapsLock",
96  cmd: "Meta",
97  ctrl: "Control",
98  delete: "Delete",
99  end: "End",
100  enter: "Enter",
101  esc: "Escape",
102  home: "Home",
103  insert: "Insert",
104  option: "Alt",
105  pagedown: "PageDown",
106  pageup: "PageUp",
107  shift: "Shift",
108  space: " ",
109  super: "Meta",
110  tab: "Tab",
111  win: "Meta",
112};
113
114export interface MessageItem {
115  type: "message";
116  content: Array<{ text: string }>;
117}
118
119export interface FunctionCallItem {
120  type: "function_call";
121  call_id: string;
122  name: string;
123  arguments: string;
124}
125
126export interface ComputerCallItem {
127  type: "computer_call";
128  call_id: string;
129  action: {
130    type: string;
131    [key: string]: any;
132  };
133  pending_safety_checks?: Array<{
134    id: string;
135    message: string;
136  }>;
137}
138
139export interface OutputItem {
140  type: "computer_call_output" | "function_call_output";
141  call_id: string;
142  acknowledged_safety_checks?: Array<{
143    id: string;
144    message: string;
145  }>;
146  output?:
147    | {
148        type: string;
149        image_url?: string;
150        current_url?: string;
151      }
152    | string;
153}
154
155export interface ResponseItem {
156  id: string;
157  output: (MessageItem | FunctionCallItem | ComputerCallItem)
158[];
159}
160
161export function pp(obj: any): void {
162  console.log(JSON.stringify(obj, null, 2));
163}
164
165export function sanitizeMessage(msg: any): any {
166  if (msg?.type === "computer_call_output") {
167    const output = msg.output || {};
168    if (typeof output === "object") {
169      return {
170        ...msg,
171        output: { ...output, image_url: "[omitted]" },
172      };
173    }
174  }
175  return msg;
176}
177
178export async function createResponse(params: any): Promise<ResponseItem> {
179  const url = "https://api.openai.com/v1/responses";
180  const headers: Record<string, string> = {
181    Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
182    "Content-Type": "application/json",
183  };
184
185  const openaiOrg = process.env.OPENAI_ORG;
186  if (openaiOrg) {
187    headers["Openai-Organization"] = openaiOrg;
188  }
189
190  const response = await fetch(url, {
191    method: "POST",
192    headers,
193    body: JSON.stringify(params),
194  });
195
196  if (!response.ok) {
197    const errorText = await response.text();
198    throw new Error(`OpenAI API Error: ${response.status} ${errorText}`);
199  }
200
201  return (await response.json()) as ResponseItem;
202}
203
204export function checkBlocklistedUrl(url: string): void {
205  try {
206    const hostname = new URL(url).hostname || "";
207    const isBlocked = BLOCKED_DOMAINS.some(
208      (blocked) => hostname === blocked || hostname.endsWith(`.${blocked}`)
209    );
210    if (isBlocked) {
211      throw new Error(`Blocked URL: ${url}`);
212    }
213  } catch (error) {
214    if (error instanceof Error && error.message.startsWith("Blocked URL:")) {
215      throw error;
216    }
217  }
218}

Step 2: Create Steel Browser Integration

Typescript

steelBrowser.ts

1export class SteelBrowser {
2  private client: Steel;
3  private session: any;
4  private browser: Browser | null = null;
5  private page: Page | null = null;
6  private dimensions: [number, number];
7  private proxy: boolean;
8  private solveCaptcha: boolean;
9  private virtualMouse: boolean;
10  private sessionTimeout: number;
11  private adBlocker: boolean;
12  private startUrl: string;
13
14  constructor(
15    width: number = 1024,
16    height: number = 768,
17    proxy: boolean = false,
18    solveCaptcha: boolean = false,
19    virtualMouse: boolean = true,
20    sessionTimeout: number = 900000, // 15 minutes
21    adBlocker: boolean = true,
22    startUrl: string = "https://www.google.com"
23  ) {
24    this.client = new Steel({
25      steelAPIKey: process.env.STEEL_API_KEY!,
26    });
27    this.dimensions = [width, height];
28    this.proxy = proxy;
29    this.solveCaptcha = solveCaptcha;
30    this.virtualMouse = virtualMouse;
31    this.sessionTimeout = sessionTimeout;
32    this.adBlocker = adBlocker;
33    this.startUrl = startUrl;
34  }
35
36  getEnvironment(): string {
37    return "browser";
38  }
39
40  getDimensions(): [number, number] {
41    return this.dimensions;
42  }
43
44  getCurrentUrl(): string {
45    return this.page?.url() || "";
46  }
47
48  async initialize(): Promise<void> {
49    const [width, height] = this.dimensions;
50    const sessionParams = {
51      useProxy: this.proxy,
52      solveCaptcha: this.solveCaptcha,
53      apiTimeout: this.sessionTimeout,
54      blockAds: this.adBlocker,
55      dimensions: { width, height },
56    };
57
58    this.session = await this.client.sessions.create(sessionParams);
59    console.log("Steel Session created successfully!");
60    console.log(`View live session at: ${this.session.sessionViewerUrl}`);
61
62    const cdpUrl = `${this.session.websocketUrl}&apiKey=${process.env.STEEL_API_KEY}`;
63
64    this.browser = await chromium.connectOverCDP(cdpUrl, {
65      timeout: 60000,
66    });
67
68    const context = this.browser.contexts()
69[0];
70
71    await context.route("**/*", async (route, request) => {
72      const url = request.url();
73      try {
74        checkBlocklistedUrl(url);
75        await route.continue();
76      } catch (error) {
77        console.log(`Blocking URL: ${url}`);
78        await route.abort();
79      }
80    });
81
82    if (this.virtualMouse) {
83      await context.addInitScript(`
84        if (window.self === window.top) {
85          function initCursor() {
86            const CURSOR_ID = '__cursor__';
87            if (document.getElementById(CURSOR_ID)) return;
88
89            const cursor = document.createElement('div');
90            cursor.id = CURSOR_ID;
91            Object.assign(cursor.style, {
92              position: 'fixed',
93              top: '0px',
94              left: '0px',
95              width: '20px',
96              height: '20px',
97              backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',
98              backgroundSize: 'cover',
99              pointerEvents: 'none',
100              zIndex: '99999',
101              transform: 'translate(-2px, -2px)',
102            });
103
104            document.body.appendChild(cursor);
105
106            document.addEventListener("mousemove", (e) => {
107              cursor.style.top = e.clientY + "px";
108              cursor.style.left = e.clientX + "px";
109            });
110          }
111
112          function checkBody() {
113            if (document.body) {
114              initCursor();
115            } else {
116              requestAnimationFrame(checkBody);
117            }
118          }
119          requestAnimationFrame(checkBody);
120        }
121      `);
122    }
123
124    this.page = context.pages()[0];
125
126    // Explicitly set viewport size to ensure it matches our expected dimensions
127    await this.page.setViewportSize({
128      width: width,
129      height: height,
130    });
131
132    await this.page.goto(this.startUrl);
133  }
134
135  async cleanup(): Promise<void> {
136    if (this.page) {
137      await this.page.close();
138    }
139    if (this.browser) {
140      await this.browser.close();
141    }
142    if (this.session) {
143      console.log("Releasing Steel session...");
144      await this.client.sessions.release(this.session.id);
145      console.log(
146        `Session completed. View replay at ${this.session.sessionViewerUrl}`
147      );
148    }
149  }
150
151  async screenshot(): Promise<string> {
152    if (!this.page) throw new Error("Page not initialized");
153
154    try {
155      // Use regular Playwright screenshot for consistent viewport sizing
156      const buffer = await this.page.screenshot({
157        fullPage: false,
158        clip: {
159          x: 0,
160          y: 0,
161          width: this.dimensions[0],
162          height: this.dimensions[1],
163        },
164      });
165      return buffer.toString("base64");
166    } catch (error) {
167      console.log(`Screenshot failed: ${error}`);
168      // Fallback to CDP screenshot without fromSurface
169      try {
170        const cdpSession = await this.page.context().newCDPSession(this.page);
171        const result = await cdpSession.send("Page.captureScreenshot", {
172          format: "png",
173          fromSurface: false,
174        });
175        return result.data;
176      } catch (cdpError) {
177        console.log(`CDP screenshot also failed: ${cdpError}`);
178        throw error;
179      }
180    }
181  }
182
183  async click(x: number, y: number, button: string = "left"): Promise<void> {
184    if (!this.page) throw new Error("Page not initialized");
185
186    if (button === "back") {
187      await this.back();
188    } else if (button === "forward") {
189      await this.forward();
190    } else if (button === "wheel") {
191      await this.page.mouse.wheel(x, y);
192    } else {
193      const buttonType = { left: "left", right: "right" }[button] || "left";
194      await this.page.mouse.click(x, y, {
195        button: buttonType as any,
196      });
197    }
198  }
199
200  async doubleClick(x: number, y: number): Promise<void> {
201    if (!this.page) throw new Error("Page not initialized");
202    await this.page.mouse.dblclick(x, y);
203  }
204
205  async scroll(
206    x: number,
207    y: number,
208    scroll_x: number,
209    scroll_y: number
210  ): Promise<void> {
211    if (!this.page) throw new Error("Page not initialized");
212    await this.page.mouse.move(x, y);
213    await this.page.evaluate(
214      ({ scrollX, scrollY }) => {
215        window.scrollBy(scrollX, scrollY);
216      },
217      { scrollX: scroll_x, scrollY: scroll_y }
218    );
219  }
220
221  async type(text: string): Promise<void> {
222    if (!this.page) throw new Error("Page not initialized");
223    await this.page.keyboard.type(text);
224  }
225
226  async wait(ms: number = 1000): Promise<void> {
227    await new Promise((resolve) => setTimeout(resolve, ms));
228  }
229
230  async move(x: number, y: number): Promise<void> {
231    if (!this.page) throw new Error("Page not initialized");
232    await this.page.mouse.move(x, y);
233  }
234
235  async keypress(keys: string[]): Promise<void> {
236    if (!this.page) throw new Error("Page not initialized");
237
238    const mappedKeys = keys.map(
239      (key) => CUA_KEY_TO_PLAYWRIGHT_KEY[key.toLowerCase()] || key
240    );
241
242    for (const key of mappedKeys) {
243      await this.page.keyboard.down(key);
244    }
245
246    for (const key of mappedKeys.reverse()) {
247      await this.page.keyboard.up(key);
248    }
249  }
250
251  async drag(path: Array<{ x: number; y: number }>): Promise<void> {
252    if (!this.page) throw new Error("Page not initialized");
253    if (path.length === 0) return;
254
255    await this.page.mouse.move(path[0].x, path[0].y);
256    await this.page.mouse.down();
257
258    for (const point of path.slice(1)) {
259      await this.page.mouse.move(point.x, point.y);
260    }
261
262    await this.page.mouse.up();
263  }
264
265  async goto(url: string): Promise<void> {
266    if (!this.page) throw new Error("Page not initialized");
267    try {
268      await this.page.goto(url);
269    } catch (error) {
270      console.log(`Error navigating to ${url}: ${error}`);
271    }
272  }
273
274  async back(): Promise<void> {
275    if (!this.page) throw new Error("Page not initialized");
276    await this.page.goBack();
277  }
278
279  async forward(): Promise<void> {
280    if (!this.page) throw new Error("Page not initialized");
281    await this.page.goForward();
282  }
283
284  async getViewportInfo(): Promise<any> {
285    /**Get detailed viewport information for debugging.*/
286    if (!this.page) {
287      return {};
288    }
289
290    try {
291      return await this.page.evaluate(() => ({
292        innerWidth: window.innerWidth,
293        innerHeight: window.innerHeight,
294        devicePixelRatio: window.devicePixelRatio,
295        screenWidth: window.screen.width,
296        screenHeight: window.screen.height,
297        scrollX: window.scrollX,
298        scrollY: window.scrollY,
299      }));
300    } catch {
301      return {};
302    }
303  }
304}

Step 3: Create the Agent Class

Typescript

agent.ts

1export class Agent {
2  private model: string;
3  private computer: SteelBrowser;
4  private tools: any[];
5  private autoAcknowledgeSafety: boolean;
6  private printSteps: boolean = true;
7  private debug: boolean = false;
8  private showImages: boolean = false;
9  private viewportWidth: number;
10  private viewportHeight: number;
11  private systemPrompt: string;
12
13  constructor(
14    model: string = "computer-use-preview",
15    computer: SteelBrowser,
16    tools: any[] = [],
17    autoAcknowledgeSafety: boolean = true
18  ) {
19    this.model = model;
20    this.computer = computer;
21    this.tools = tools;
22    this.autoAcknowledgeSafety = autoAcknowledgeSafety;
23
24    const [width, height] = computer.getDimensions();
25    this.viewportWidth = width;
26    this.viewportHeight = height;
27
28    // Create dynamic system prompt with viewport dimensions
29    this.systemPrompt = SYSTEM_PROMPT.replace(
30      "<COORDINATE_SYSTEM>",
31      `<COORDINATE_SYSTEM>
32* The browser viewport dimensions are ${width}x${height} pixels
33* The browser viewport has specific dimensions that you must respect`
34    );
35
36    this.tools.push({
37      type: "computer-preview",
38      display_width: width,
39      display_height: height,
40      environment: computer.getEnvironment(),
41    });
42
43    // Add goto function tool for direct URL navigation
44    this.tools.push({
45      type: "function",
46      name: "goto",
47      description: "Navigate directly to a specific URL.",
48      parameters: {
49        type: "object",
50        properties: {
51          url: {
52            type: "string",
53            description:
54              "Fully qualified URL to navigate to (e.g., https://example.com).",
55          },
56        },
57        additionalProperties: false,
58        required: ["url"],
59      },
60    });
61
62    // Add back function tool for browser navigation
63    this.tools.push({
64      type: "function",
65      name: "back",
66      description: "Go back to the previous page.",
67      parameters: {},
68    });
69  }
70
71  debugPrint(...args: any[]): void {
72    if (this.debug) {
73      pp(args);
74    }
75  }
76
77  private async getViewportInfo(): Promise<any> {
78    /**Get detailed viewport information for debugging.*/
79    return await this.computer.getViewportInfo();
80  }
81
82  private async validateScreenshotDimensions(
83    screenshotBase64: string
84  ): Promise<any> {
85    /**Validate screenshot dimensions against viewport.*/
86    try {
87      // Decode base64 and get image dimensions
88      const buffer = Buffer.from(screenshotBase64, "base64");
89
90      // Simple way to get dimensions from PNG buffer
91      // PNG width is at bytes 16-19, height at bytes 20-23
92      const width = buffer.readUInt32BE(16);
93      const height = buffer.readUInt32BE(20);
94
95      const viewportInfo = await this.getViewportInfo();
96
97      const scalingInfo = {
98        screenshot_size: [width, height],
99        viewport_size: [this.viewportWidth, this.viewportHeight],
100        actual_viewport: [
101          viewportInfo.innerWidth || 0,
102          viewportInfo.innerHeight || 0,
103        ],
104        device_pixel_ratio: viewportInfo.devicePixelRatio || 1.0,
105        width_scale: this.viewportWidth > 0 ? width / this.viewportWidth : 1.0,
106        height_scale:
107          this.viewportHeight > 0 ? height / this.viewportHeight : 1.0,
108      };
109
110      // Warn about scaling mismatches
111      if (scalingInfo.width_scale !== 1.0 || scalingInfo.height_scale !== 1.0) {
112        console.log(`⚠️  Screenshot scaling detected:`);
113        console.log(`   Screenshot: ${width}x${height}`);
114        console.log(
115          `   Expected viewport: ${this.viewportWidth}x${this.viewportHeight}`
116        );
117        console.log(
118          `   Actual viewport: ${viewportInfo.innerWidth || "unknown"}x${
119            viewportInfo.innerHeight || "unknown"
120          }`
121        );
122        console.log(
123          `   Scale factors: ${scalingInfo.width_scale.toFixed(
124            3
125          )}x${scalingInfo.height_scale.toFixed(3)}`
126        );
127      }
128
129      return scalingInfo;
130    } catch (error) {
131      console.log(`⚠️  Error validating screenshot dimensions: ${error}`);
132      return {};
133    }
134  }
135
136  private validateCoordinates(actionArgs: any): any {
137    const validatedArgs = { ...actionArgs };
138
139    // Handle single coordinates (click, move, etc.)
140    if ("x" in actionArgs && "y" in actionArgs) {
141      validatedArgs.x = this.toNumber(actionArgs.x);
142      validatedArgs.y = this.toNumber(actionArgs.y);
143    }
144
145    // Handle path arrays (drag)
146    if ("path" in actionArgs && Array.isArray(actionArgs.path)) {
147      validatedArgs.path = actionArgs.path.map((point: any) => ({
148        x: this.toNumber(point.x),
149        y: this.toNumber(point.y),
150      }));
151    }
152
153    return validatedArgs;
154  }
155
156  private toNumber(value: any): number {
157    if (typeof value === "string") {
158      const num = parseFloat(value);
159      return isNaN(num) ? 0 : num;
160    }
161    return typeof value === "number" ? value : 0;
162  }
163
164  async executeAction(actionType: string, actionArgs: any): Promise<void> {
165    const validatedArgs = this.validateCoordinates(actionArgs);
166
167    switch (actionType) {
168      case "click":
169        await this.computer.click(
170          validatedArgs.x,
171          validatedArgs.y,
172          validatedArgs.button || "left"
173        );
174        break;
175      case "doubleClick":
176      case "double_click":
177        await this.computer.doubleClick(validatedArgs.x, validatedArgs.y);
178        break;
179      case "move":
180        await this.computer.move(validatedArgs.x, validatedArgs.y);
181        break;
182      case "scroll":
183        await this.computer.scroll(
184          validatedArgs.x,
185          validatedArgs.y,
186          this.toNumber(validatedArgs.scroll_x),
187          this.toNumber(validatedArgs.scroll_y)
188        );
189        break;
190      case "drag":
191        const path = validatedArgs.path || [];
192        await this.computer.drag(path);
193        break;
194      case "type":
195        await this.computer.type(validatedArgs.text || "");
196        break;
197      case "keypress":
198        await this.computer.keypress(validatedArgs.keys || []);
199        break;
200      case "wait":
201        await this.computer.wait(this.toNumber(validatedArgs.ms) || 1000);
202        break;
203      case "goto":
204        await this.computer.goto(validatedArgs.url || "");
205        break;
206      case "back":
207        await this.computer.back();
208        break;
209      case "forward":
210        await this.computer.forward();
211        break;
212      case "screenshot":
213        break;
214      default:
215        const method = (this.computer as any)
216[actionType];
217        if (typeof method === "function") {
218          await method.call(this.computer, ...Object.values(validatedArgs));
219        }
220        break;
221    }
222  }
223
224  async handleItem(
225    item: MessageItem | FunctionCallItem | ComputerCallItem
226  ): Promise<OutputItem[]> {
227    if (item.type === "message") {
228      if (this.printSteps) {
229        console.log(item.content[0].text);
230      }
231    } else if (item.type === "function_call") {
232      const { name, arguments: argsStr } = item;
233      const args = JSON.parse(argsStr);
234
235      if (this.printSteps) {
236        console.log(`${name}(${JSON.stringify(args)})`);
237      }
238
239      if (typeof (this.computer as any)
240[name] === "function") {
241        const method = (this.computer as any)
242[name];
243        await method.call(this.computer, ...Object.values(args));
244      }
245
246      return [
247        {
248          type: "function_call_output",
249          call_id: item.call_id,
250          output: "success",
251        },
252      ];
253    } else if (item.type === "computer_call") {
254      const { action } = item;
255      const actionType = action.type;
256      const { type, ...actionArgs } = action;
257
258      if (this.printSteps) {
259        console.log(`${actionType}(${JSON.stringify(actionArgs)})`);
260      }
261
262      await this.executeAction(actionType, actionArgs);
263      const screenshotBase64 = await this.computer.screenshot();
264
265      // Validate screenshot dimensions for debugging
266      await this.validateScreenshotDimensions(screenshotBase64);
267
268      const pendingChecks = item.pending_safety_checks || [];
269      for (const check of pendingChecks) {
270        if (this.autoAcknowledgeSafety) {
271          console.log(`⚠️  Auto-acknowledging safety check: ${check.message}`);
272        } else {
273          throw new Error(`Safety check failed: ${check.message}`);
274        }
275      }
276
277      const callOutput: OutputItem = {
278        type: "computer_call_output",
279        call_id: item.call_id,
280        acknowledged_safety_checks: pendingChecks,
281        output: {
282          type: "input_image",
283          image_url: `data:image/png;base64,${screenshotBase64}`,
284        },
285      };
286
287      if (this.computer.getEnvironment() === "browser") {
288        const currentUrl = this.computer.getCurrentUrl();
289        checkBlocklistedUrl(currentUrl);
290        (callOutput.output as any).current_url = currentUrl;
291      }
292
293      return [callOutput];
294    }
295
296    return [];
297  }
298
299  async executeTask(
300    task: string,
301    printSteps: boolean = true,
302    debug: boolean = false,
303    maxIterations: number = 50
304  ): Promise<string> {
305    this.printSteps = printSteps;
306    this.debug = debug;
307    this.showImages = false;
308
309    const inputItems = [
310      {
311        role: "system",
312        content: this.systemPrompt,
313      },
314      {
315        role: "user",
316        content: task,
317      },
318    ];
319
320    let newItems: any[] = [];
321    let iterations = 0;
322    let consecutiveNoActions = 0;
323    let lastAssistantMessages: string[] = [];
324
325    console.log(`🎯 Executing task: ${task}`);
326    console.log("=".repeat(60));
327
328    const isTaskComplete = (
329      content: string
330    ): { completed: boolean; reason?: string } => {
331      const lowerContent = content.toLowerCase();
332
333      if (content.includes("TASK_COMPLETED:")) {
334        return { completed: true, reason: "explicit_completion" };
335      }
336      if (
337        content.includes("TASK_FAILED:") ||
338        content.includes("TASK_ABANDONED:")
339      ) {
340        return { completed: true, reason: "explicit_failure" };
341      }
342
343      const completionPatterns = [
344        /task\s+(completed|finished|done|accomplished)/i,
345        /successfully\s+(completed|finished|found|gathered)/i,
346        /here\s+(is|are)\s+the\s+(results?|information|summary)/i,
347        /to\s+summarize/i,
348        /in\s+conclusion/i,
349        /final\s+(answer|result|summary)/i,
350      ];
351
352      const failurePatterns = [
353        /cannot\s+(complete|proceed|access|continue)/i,
354        /unable\s+to\s+(complete|access|find|proceed)/i,
355        /blocked\s+by\s+(captcha|security|authentication)/i,
356        /giving\s+up/i,
357        /no\s+longer\s+able/i,
358        /have\s+tried\s+multiple\s+approaches/i,
359      ];
360
361      if (completionPatterns.some((pattern) => pattern.test(content))) {
362        return { completed: true, reason: "natural_completion" };
363      }
364
365      if (failurePatterns.some((pattern) => pattern.test(content))) {
366        return { completed: true, reason: "natural_failure" };
367      }
368
369      return { completed: false };
370    };
371
372    const detectRepetition = (newMessage: string): boolean => {
373      if (lastAssistantMessages.length < 2) return false;
374
375      const similarity = (str1: string, str2: string): number => {
376        const words1 = str1.toLowerCase().split(/\s+/);
377        const words2 = str2.toLowerCase().split(/\s+/);
378        const commonWords = words1.filter((word) => words2.includes(word));
379        return commonWords.length / Math.max(words1.length, words2.length);
380      };
381
382      return lastAssistantMessages.some(
383        (prevMessage) => similarity(newMessage, prevMessage) > 0.8
384      );
385    };
386
387    while (iterations < maxIterations) {
388      iterations++;
389      let hasActions = false;
390
391      if (
392        newItems.length > 0 &&
393        newItems[newItems.length - 1]?.role === "assistant"
394      ) {
395        const lastMessage = newItems[newItems.length - 1];
396        if (lastMessage.content?.[0]?.text) {
397          const content = lastMessage.content[0].text;
398
399          const completion = isTaskComplete(content);
400          if (completion.completed) {
401            console.log(`✅ Task completed (${completion.reason})`);
402            break;
403          }
404
405          if (detectRepetition(content)) {
406            console.log("🔄 Repetition detected - stopping execution");
407            lastAssistantMessages.push(content);
408            break;
409          }
410
411          lastAssistantMessages.push(content);
412          if (lastAssistantMessages.length > 3) {
413            lastAssistantMessages.shift(); // Keep only last 3
414          }
415        }
416      }
417
418      this.debugPrint([...inputItems, ...newItems].map(sanitizeMessage));
419
420      try {
421        const response = await createResponse({
422          model: this.model,
423          input: [...inputItems, ...newItems],
424          tools: this.tools,
425          truncation: "auto",
426        });
427
428        this.debugPrint(response);
429
430        if (!response.output) {
431          if (this.debug) {
432            console.log(response);
433          }
434          throw new Error("No output from model");
435        }
436
437        newItems.push(...response.output);
438
439        for (const item of response.output) {
440          if (item.type === "computer_call" || item.type === "function_call") {
441            hasActions = true;
442          }
443          const handleResult = await this.handleItem(item);
444          newItems.push(...handleResult);
445        }
446
447        if (!hasActions) {
448          consecutiveNoActions++;
449          if (consecutiveNoActions >= 3) {
450            console.log(
451              "⚠️  No actions for 3 consecutive iterations - stopping"
452            );
453            break;
454          }
455        } else {
456          consecutiveNoActions = 0;
457        }
458      } catch (error) {
459        console.error(`❌ Error during task execution: ${error}`);
460        throw error;
461      }
462    }
463
464    if (iterations >= maxIterations) {
465      console.warn(
466        `⚠️  Task execution stopped after ${maxIterations} iterations`
467      );
468    }
469
470    const assistantMessages = newItems.filter(
471      (item) => item.role === "assistant"
472    );
473    const finalMessage = assistantMessages[assistantMessages.length - 1];
474
475    return (
476      finalMessage?.content?.[0]?.text ||
477      "Task execution completed (no final message)"
478    );
479  }
480}

Step 4: Create the Main Script

Typescript

index.ts

1import { SteelBrowser } from "./steelBrowser";
2import { Agent } from "./agent";
3import { STEEL_API_KEY, OPENAI_API_KEY, TASK } from "./helpers";
4
5async function main(): Promise<void> {
6  console.log("🚀 Steel + OpenAI Computer Use Assistant");
7  console.log("=".repeat(60));
8
9  if (STEEL_API_KEY === "your-steel-api-key-here") {
10    console.warn(
11      "⚠️  WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"
12    );
13    console.warn(
14      "   Get your API key at: https://app.steel.dev/settings/api-keys"
15    );
16    return;
17  }
18
19  if (OPENAI_API_KEY === "your-openai-api-key-here") {
20    console.warn(
21      "⚠️  WARNING: Please replace 'your-openai-api-key-here' with your actual OpenAI API key"
22    );
23    console.warn("   Get your API key at: https://platform.openai.com/");
24    return;
25  }
26
27  console.log("\nStarting Steel browser session...");
28
29  const computer = new SteelBrowser();
30
31  try {
32    await computer.initialize();
33    console.log("✅ Steel browser session started!");
34
35    const agent = new Agent("computer-use-preview", computer, [], true);
36
37    const startTime = Date.now();
38
39    try {
40      const result = await agent.executeTask(TASK, true, false, 50);
41
42      const duration = ((Date.now() - startTime) / 1000).toFixed(1);
43
44      console.log("\n" + "=".repeat(60));
45      console.log("🎉 TASK EXECUTION COMPLETED");
46      console.log("=".repeat(60));
47      console.log(`⏱️  Duration: ${duration} seconds`);
48      console.log(`🎯 Task: ${TASK}`);
49      console.log(`📋 Result:\n${result}`);
50      console.log("=".repeat(60));
51    } catch (error) {
52      console.error(`❌ Task execution failed: ${error}`);
53      process.exit(1);
54    }
55  } catch (error) {
56    console.log(`❌ Failed to start Steel browser: ${error}`);
57    console.log("Please check your STEEL_API_KEY and internet connection.");
58    process.exit(1);
59  } finally {
60    await computer.cleanup();
61  }
62}
63
64main().catch(console.error);

Running Your Agent

Execute your script to start an interactive AI browser session:

The agent will execute the task defined in the TASK environment variable or the default task. You can modify the task by setting the environment variable:

Terminal

export TASK="Research the top 5 electric vehicles with the longest range"
npm start

You'll see each action the agent takes displayed in the console, and you can view the live browser session by opening the session URL in your web browser.

Next Steps

Explore the Steel API documentation for more advanced features
Check out the OpenAI documentation for more information about the computer-use-preview model
Add additional features like session recording or multi-session management
Add additional features like session recording or multi-session management