Quickstart (Typescript)

How to use OpenAI Computer Use with Steel

This guide will walk you through how to use OpenAI's computer-use-previewmodel with Steel's managed remote browsers to create AI agents that can navigate the web.

We’ll be implementing a simple CUA loop that functions as described below:

Computer use - OpenAI API

Prerequisites

  • Node.js 20+

  • A Steel API key (sign up here)

  • An OpenAI API key with access to the computer-use-preview model

Step 1: Setup and Helper Functions

Typescript
helpers.ts
1
import { chromium } from "playwright";
2
import type { Browser, Page } from "playwright";
3
import { Steel } from "steel-sdk";
4
import * as dotenv from "dotenv";
5
6
dotenv.config();
7
8
// Replace with your own API keys
9
export const STEEL_API_KEY =
10
process.env.STEEL_API_KEY || "your-steel-api-key-here";
11
export const OPENAI_API_KEY =
12
process.env.OPENAI_API_KEY || "your-openai-api-key-here";
13
14
// Replace with your own task
15
export const TASK =
16
process.env.TASK || "Go to Wikipedia and search for machine learning";
17
18
export const SYSTEM_PROMPT = `You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.
19
20
<CAPABILITIES>
21
* You control a Chrome browser tab and can navigate to any website
22
* You can click, type, scroll, take screenshots, and interact with web elements
23
* You have full internet access and can visit any public website
24
* You can read content, fill forms, search for information, and perform complex multi-step tasks
25
* After each action, you receive a screenshot showing the current state
26
* Use the goto(url) function to navigate directly to URLs - DO NOT try to click address bars or browser UI
27
* Use the back() function to go back to the previous page
28
29
<COORDINATE_SYSTEM>
30
* The browser viewport has specific dimensions that you must respect
31
* All coordinates (x, y) must be within the viewport bounds
32
* X coordinates must be between 0 and the display width (inclusive)
33
* Y coordinates must be between 0 and the display height (inclusive)
34
* Always ensure your click, move, scroll, and drag coordinates are within these bounds
35
* If you're unsure about element locations, take a screenshot first to see the current state
36
37
<AUTONOMOUS_EXECUTION>
38
* Work completely independently - make decisions and act immediately without asking questions
39
* Never request clarification, present options, or ask for permission
40
* Make intelligent assumptions based on task context
41
* If something is ambiguous, choose the most logical interpretation and proceed
42
* Take immediate action rather than explaining what you might do
43
* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions
44
45
<REASONING_STRUCTURE>
46
For each step, you must reason systematically:
47
* Analyze your previous action's success/failure and current state
48
* Identify what specific progress has been made toward the goal
49
* Determine the next immediate objective and how to achieve it
50
* Choose the most efficient action sequence to make progress
51
52
<EFFICIENCY_PRINCIPLES>
53
* Combine related actions when possible rather than single-step execution
54
* Navigate directly to relevant websites without unnecessary exploration
55
* Use screenshots strategically to understand page state before acting
56
* Be persistent with alternative approaches if initial attempts fail
57
* Focus on the specific information or outcome requested
58
59
<COMPLETION_CRITERIA>
60
* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"
61
* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"
62
* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"
63
* Do not write anything after completing the task except the required completion message
64
* Do not ask questions, provide commentary, or offer additional help after task completion
65
* The completion message is the end of the interaction - nothing else should follow
66
67
<CRITICAL_REQUIREMENTS>
68
* This is fully automated execution - work completely independently
69
* Start by taking a screenshot to understand the current state
70
* Use goto(url) function for navigation - never click on browser UI elements
71
* Always respect coordinate boundaries - invalid coordinates will fail
72
* Recognize when the stated objective has been achieved and declare completion immediately
73
* Focus on the explicit task given, not implied or potential follow-up tasks
74
75
Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results.`;
76
77
export const BLOCKED_DOMAINS = [
78
"maliciousbook.com",
79
"evilvideos.com",
80
"darkwebforum.com",
81
"shadytok.com",
82
"suspiciouspins.com",
83
"ilanbigio.com",
84
];
85
86
export const CUA_KEY_TO_PLAYWRIGHT_KEY: Record<string, string> = {
87
"/": "Divide",
88
"\\": "Backslash",
89
alt: "Alt",
90
arrowdown: "ArrowDown",
91
arrowleft: "ArrowLeft",
92
arrowright: "ArrowRight",
93
arrowup: "ArrowUp",
94
backspace: "Backspace",
95
capslock: "CapsLock",
96
cmd: "Meta",
97
ctrl: "Control",
98
delete: "Delete",
99
end: "End",
100
enter: "Enter",
101
esc: "Escape",
102
home: "Home",
103
insert: "Insert",
104
option: "Alt",
105
pagedown: "PageDown",
106
pageup: "PageUp",
107
shift: "Shift",
108
space: " ",
109
super: "Meta",
110
tab: "Tab",
111
win: "Meta",
112
};
113
114
export interface MessageItem {
115
type: "message";
116
content: Array<{ text: string }>;
117
}
118
119
export interface FunctionCallItem {
120
type: "function_call";
121
call_id: string;
122
name: string;
123
arguments: string;
124
}
125
126
export interface ComputerCallItem {
127
type: "computer_call";
128
call_id: string;
129
action: {
130
type: string;
131
[key: string]: any;
132
};
133
pending_safety_checks?: Array<{
134
id: string;
135
message: string;
136
}>;
137
}
138
139
export interface OutputItem {
140
type: "computer_call_output" | "function_call_output";
141
call_id: string;
142
acknowledged_safety_checks?: Array<{
143
id: string;
144
message: string;
145
}>;
146
output?:
147
| {
148
type: string;
149
image_url?: string;
150
current_url?: string;
151
}
152
| string;
153
}
154
155
export interface ResponseItem {
156
id: string;
157
output: (MessageItem | FunctionCallItem | ComputerCallItem)
158
[];
159
}
160
161
export function pp(obj: any): void {
162
console.log(JSON.stringify(obj, null, 2));
163
}
164
165
export function sanitizeMessage(msg: any): any {
166
if (msg?.type === "computer_call_output") {
167
const output = msg.output || {};
168
if (typeof output === "object") {
169
return {
170
...msg,
171
output: { ...output, image_url: "[omitted]" },
172
};
173
}
174
}
175
return msg;
176
}
177
178
export async function createResponse(params: any): Promise<ResponseItem> {
179
const url = "https://api.openai.com/v1/responses";
180
const headers: Record<string, string> = {
181
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
182
"Content-Type": "application/json",
183
};
184
185
const openaiOrg = process.env.OPENAI_ORG;
186
if (openaiOrg) {
187
headers["Openai-Organization"] = openaiOrg;
188
}
189
190
const response = await fetch(url, {
191
method: "POST",
192
headers,
193
body: JSON.stringify(params),
194
});
195
196
if (!response.ok) {
197
const errorText = await response.text();
198
throw new Error(`OpenAI API Error: ${response.status} ${errorText}`);
199
}
200
201
return (await response.json()) as ResponseItem;
202
}
203
204
export function checkBlocklistedUrl(url: string): void {
205
try {
206
const hostname = new URL(url).hostname || "";
207
const isBlocked = BLOCKED_DOMAINS.some(
208
(blocked) => hostname === blocked || hostname.endsWith(`.${blocked}`)
209
);
210
if (isBlocked) {
211
throw new Error(`Blocked URL: ${url}`);
212
}
213
} catch (error) {
214
if (error instanceof Error && error.message.startsWith("Blocked URL:")) {
215
throw error;
216
}
217
}
218
}

Step 2: Create Steel Browser Integration

Typescript
steelBrowser.ts
1
export class SteelBrowser {
2
private client: Steel;
3
private session: any;
4
private browser: Browser | null = null;
5
private page: Page | null = null;
6
private dimensions: [number, number];
7
private proxy: boolean;
8
private solveCaptcha: boolean;
9
private virtualMouse: boolean;
10
private sessionTimeout: number;
11
private adBlocker: boolean;
12
private startUrl: string;
13
14
constructor(
15
width: number = 1024,
16
height: number = 768,
17
proxy: boolean = false,
18
solveCaptcha: boolean = false,
19
virtualMouse: boolean = true,
20
sessionTimeout: number = 900000, // 15 minutes
21
adBlocker: boolean = true,
22
startUrl: string = "https://www.google.com"
23
) {
24
this.client = new Steel({
25
steelAPIKey: process.env.STEEL_API_KEY!,
26
});
27
this.dimensions = [width, height];
28
this.proxy = proxy;
29
this.solveCaptcha = solveCaptcha;
30
this.virtualMouse = virtualMouse;
31
this.sessionTimeout = sessionTimeout;
32
this.adBlocker = adBlocker;
33
this.startUrl = startUrl;
34
}
35
36
getEnvironment(): string {
37
return "browser";
38
}
39
40
getDimensions(): [number, number] {
41
return this.dimensions;
42
}
43
44
getCurrentUrl(): string {
45
return this.page?.url() || "";
46
}
47
48
async initialize(): Promise<void> {
49
const [width, height] = this.dimensions;
50
const sessionParams = {
51
useProxy: this.proxy,
52
solveCaptcha: this.solveCaptcha,
53
apiTimeout: this.sessionTimeout,
54
blockAds: this.adBlocker,
55
dimensions: { width, height },
56
};
57
58
this.session = await this.client.sessions.create(sessionParams);
59
console.log("Steel Session created successfully!");
60
console.log(`View live session at: ${this.session.sessionViewerUrl}`);
61
62
const cdpUrl = `${this.session.websocketUrl}&apiKey=${process.env.STEEL_API_KEY}`;
63
64
this.browser = await chromium.connectOverCDP(cdpUrl, {
65
timeout: 60000,
66
});
67
68
const context = this.browser.contexts()
69
[0];
70
71
await context.route("**/*", async (route, request) => {
72
const url = request.url();
73
try {
74
checkBlocklistedUrl(url);
75
await route.continue();
76
} catch (error) {
77
console.log(`Blocking URL: ${url}`);
78
await route.abort();
79
}
80
});
81
82
if (this.virtualMouse) {
83
await context.addInitScript(`
84
if (window.self === window.top) {
85
function initCursor() {
86
const CURSOR_ID = '__cursor__';
87
if (document.getElementById(CURSOR_ID)) return;
88
89
const cursor = document.createElement('div');
90
cursor.id = CURSOR_ID;
91
Object.assign(cursor.style, {
92
position: 'fixed',
93
top: '0px',
94
left: '0px',
95
width: '20px',
96
height: '20px',
97
backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',
98
backgroundSize: 'cover',
99
pointerEvents: 'none',
100
zIndex: '99999',
101
transform: 'translate(-2px, -2px)',
102
});
103
104
document.body.appendChild(cursor);
105
106
document.addEventListener("mousemove", (e) => {
107
cursor.style.top = e.clientY + "px";
108
cursor.style.left = e.clientX + "px";
109
});
110
}
111
112
function checkBody() {
113
if (document.body) {
114
initCursor();
115
} else {
116
requestAnimationFrame(checkBody);
117
}
118
}
119
requestAnimationFrame(checkBody);
120
}
121
`);
122
}
123
124
this.page = context.pages()[0];
125
126
// Explicitly set viewport size to ensure it matches our expected dimensions
127
await this.page.setViewportSize({
128
width: width,
129
height: height,
130
});
131
132
await this.page.goto(this.startUrl);
133
}
134
135
async cleanup(): Promise<void> {
136
if (this.page) {
137
await this.page.close();
138
}
139
if (this.browser) {
140
await this.browser.close();
141
}
142
if (this.session) {
143
console.log("Releasing Steel session...");
144
await this.client.sessions.release(this.session.id);
145
console.log(
146
`Session completed. View replay at ${this.session.sessionViewerUrl}`
147
);
148
}
149
}
150
151
async screenshot(): Promise<string> {
152
if (!this.page) throw new Error("Page not initialized");
153
154
try {
155
// Use regular Playwright screenshot for consistent viewport sizing
156
const buffer = await this.page.screenshot({
157
fullPage: false,
158
clip: {
159
x: 0,
160
y: 0,
161
width: this.dimensions[0],
162
height: this.dimensions[1],
163
},
164
});
165
return buffer.toString("base64");
166
} catch (error) {
167
console.log(`Screenshot failed: ${error}`);
168
// Fallback to CDP screenshot without fromSurface
169
try {
170
const cdpSession = await this.page.context().newCDPSession(this.page);
171
const result = await cdpSession.send("Page.captureScreenshot", {
172
format: "png",
173
fromSurface: false,
174
});
175
return result.data;
176
} catch (cdpError) {
177
console.log(`CDP screenshot also failed: ${cdpError}`);
178
throw error;
179
}
180
}
181
}
182
183
async click(x: number, y: number, button: string = "left"): Promise<void> {
184
if (!this.page) throw new Error("Page not initialized");
185
186
if (button === "back") {
187
await this.back();
188
} else if (button === "forward") {
189
await this.forward();
190
} else if (button === "wheel") {
191
await this.page.mouse.wheel(x, y);
192
} else {
193
const buttonType = { left: "left", right: "right" }[button] || "left";
194
await this.page.mouse.click(x, y, {
195
button: buttonType as any,
196
});
197
}
198
}
199
200
async doubleClick(x: number, y: number): Promise<void> {
201
if (!this.page) throw new Error("Page not initialized");
202
await this.page.mouse.dblclick(x, y);
203
}
204
205
async scroll(
206
x: number,
207
y: number,
208
scroll_x: number,
209
scroll_y: number
210
): Promise<void> {
211
if (!this.page) throw new Error("Page not initialized");
212
await this.page.mouse.move(x, y);
213
await this.page.evaluate(
214
({ scrollX, scrollY }) => {
215
window.scrollBy(scrollX, scrollY);
216
},
217
{ scrollX: scroll_x, scrollY: scroll_y }
218
);
219
}
220
221
async type(text: string): Promise<void> {
222
if (!this.page) throw new Error("Page not initialized");
223
await this.page.keyboard.type(text);
224
}
225
226
async wait(ms: number = 1000): Promise<void> {
227
await new Promise((resolve) => setTimeout(resolve, ms));
228
}
229
230
async move(x: number, y: number): Promise<void> {
231
if (!this.page) throw new Error("Page not initialized");
232
await this.page.mouse.move(x, y);
233
}
234
235
async keypress(keys: string[]): Promise<void> {
236
if (!this.page) throw new Error("Page not initialized");
237
238
const mappedKeys = keys.map(
239
(key) => CUA_KEY_TO_PLAYWRIGHT_KEY[key.toLowerCase()] || key
240
);
241
242
for (const key of mappedKeys) {
243
await this.page.keyboard.down(key);
244
}
245
246
for (const key of mappedKeys.reverse()) {
247
await this.page.keyboard.up(key);
248
}
249
}
250
251
async drag(path: Array<{ x: number; y: number }>): Promise<void> {
252
if (!this.page) throw new Error("Page not initialized");
253
if (path.length === 0) return;
254
255
await this.page.mouse.move(path[0].x, path[0].y);
256
await this.page.mouse.down();
257
258
for (const point of path.slice(1)) {
259
await this.page.mouse.move(point.x, point.y);
260
}
261
262
await this.page.mouse.up();
263
}
264
265
async goto(url: string): Promise<void> {
266
if (!this.page) throw new Error("Page not initialized");
267
try {
268
await this.page.goto(url);
269
} catch (error) {
270
console.log(`Error navigating to ${url}: ${error}`);
271
}
272
}
273
274
async back(): Promise<void> {
275
if (!this.page) throw new Error("Page not initialized");
276
await this.page.goBack();
277
}
278
279
async forward(): Promise<void> {
280
if (!this.page) throw new Error("Page not initialized");
281
await this.page.goForward();
282
}
283
284
async getViewportInfo(): Promise<any> {
285
/**Get detailed viewport information for debugging.*/
286
if (!this.page) {
287
return {};
288
}
289
290
try {
291
return await this.page.evaluate(() => ({
292
innerWidth: window.innerWidth,
293
innerHeight: window.innerHeight,
294
devicePixelRatio: window.devicePixelRatio,
295
screenWidth: window.screen.width,
296
screenHeight: window.screen.height,
297
scrollX: window.scrollX,
298
scrollY: window.scrollY,
299
}));
300
} catch {
301
return {};
302
}
303
}
304
}

Step 3: Create the Agent Class

Typescript
agent.ts
1
export class Agent {
2
private model: string;
3
private computer: SteelBrowser;
4
private tools: any[];
5
private autoAcknowledgeSafety: boolean;
6
private printSteps: boolean = true;
7
private debug: boolean = false;
8
private showImages: boolean = false;
9
private viewportWidth: number;
10
private viewportHeight: number;
11
private systemPrompt: string;
12
13
constructor(
14
model: string = "computer-use-preview",
15
computer: SteelBrowser,
16
tools: any[] = [],
17
autoAcknowledgeSafety: boolean = true
18
) {
19
this.model = model;
20
this.computer = computer;
21
this.tools = tools;
22
this.autoAcknowledgeSafety = autoAcknowledgeSafety;
23
24
const [width, height] = computer.getDimensions();
25
this.viewportWidth = width;
26
this.viewportHeight = height;
27
28
// Create dynamic system prompt with viewport dimensions
29
this.systemPrompt = SYSTEM_PROMPT.replace(
30
"<COORDINATE_SYSTEM>",
31
`<COORDINATE_SYSTEM>
32
* The browser viewport dimensions are ${width}x${height} pixels
33
* The browser viewport has specific dimensions that you must respect`
34
);
35
36
this.tools.push({
37
type: "computer-preview",
38
display_width: width,
39
display_height: height,
40
environment: computer.getEnvironment(),
41
});
42
43
// Add goto function tool for direct URL navigation
44
this.tools.push({
45
type: "function",
46
name: "goto",
47
description: "Navigate directly to a specific URL.",
48
parameters: {
49
type: "object",
50
properties: {
51
url: {
52
type: "string",
53
description:
54
"Fully qualified URL to navigate to (e.g., https://example.com).",
55
},
56
},
57
additionalProperties: false,
58
required: ["url"],
59
},
60
});
61
62
// Add back function tool for browser navigation
63
this.tools.push({
64
type: "function",
65
name: "back",
66
description: "Go back to the previous page.",
67
parameters: {},
68
});
69
}
70
71
debugPrint(...args: any[]): void {
72
if (this.debug) {
73
pp(args);
74
}
75
}
76
77
private async getViewportInfo(): Promise<any> {
78
/**Get detailed viewport information for debugging.*/
79
return await this.computer.getViewportInfo();
80
}
81
82
private async validateScreenshotDimensions(
83
screenshotBase64: string
84
): Promise<any> {
85
/**Validate screenshot dimensions against viewport.*/
86
try {
87
// Decode base64 and get image dimensions
88
const buffer = Buffer.from(screenshotBase64, "base64");
89
90
// Simple way to get dimensions from PNG buffer
91
// PNG width is at bytes 16-19, height at bytes 20-23
92
const width = buffer.readUInt32BE(16);
93
const height = buffer.readUInt32BE(20);
94
95
const viewportInfo = await this.getViewportInfo();
96
97
const scalingInfo = {
98
screenshot_size: [width, height],
99
viewport_size: [this.viewportWidth, this.viewportHeight],
100
actual_viewport: [
101
viewportInfo.innerWidth || 0,
102
viewportInfo.innerHeight || 0,
103
],
104
device_pixel_ratio: viewportInfo.devicePixelRatio || 1.0,
105
width_scale: this.viewportWidth > 0 ? width / this.viewportWidth : 1.0,
106
height_scale:
107
this.viewportHeight > 0 ? height / this.viewportHeight : 1.0,
108
};
109
110
// Warn about scaling mismatches
111
if (scalingInfo.width_scale !== 1.0 || scalingInfo.height_scale !== 1.0) {
112
console.log(`⚠️ Screenshot scaling detected:`);
113
console.log(` Screenshot: ${width}x${height}`);
114
console.log(
115
` Expected viewport: ${this.viewportWidth}x${this.viewportHeight}`
116
);
117
console.log(
118
` Actual viewport: ${viewportInfo.innerWidth || "unknown"}x${
119
viewportInfo.innerHeight || "unknown"
120
}`
121
);
122
console.log(
123
` Scale factors: ${scalingInfo.width_scale.toFixed(
124
3
125
)}x${scalingInfo.height_scale.toFixed(3)}`
126
);
127
}
128
129
return scalingInfo;
130
} catch (error) {
131
console.log(`⚠️ Error validating screenshot dimensions: ${error}`);
132
return {};
133
}
134
}
135
136
private validateCoordinates(actionArgs: any): any {
137
const validatedArgs = { ...actionArgs };
138
139
// Handle single coordinates (click, move, etc.)
140
if ("x" in actionArgs && "y" in actionArgs) {
141
validatedArgs.x = this.toNumber(actionArgs.x);
142
validatedArgs.y = this.toNumber(actionArgs.y);
143
}
144
145
// Handle path arrays (drag)
146
if ("path" in actionArgs && Array.isArray(actionArgs.path)) {
147
validatedArgs.path = actionArgs.path.map((point: any) => ({
148
x: this.toNumber(point.x),
149
y: this.toNumber(point.y),
150
}));
151
}
152
153
return validatedArgs;
154
}
155
156
private toNumber(value: any): number {
157
if (typeof value === "string") {
158
const num = parseFloat(value);
159
return isNaN(num) ? 0 : num;
160
}
161
return typeof value === "number" ? value : 0;
162
}
163
164
async executeAction(actionType: string, actionArgs: any): Promise<void> {
165
const validatedArgs = this.validateCoordinates(actionArgs);
166
167
switch (actionType) {
168
case "click":
169
await this.computer.click(
170
validatedArgs.x,
171
validatedArgs.y,
172
validatedArgs.button || "left"
173
);
174
break;
175
case "doubleClick":
176
case "double_click":
177
await this.computer.doubleClick(validatedArgs.x, validatedArgs.y);
178
break;
179
case "move":
180
await this.computer.move(validatedArgs.x, validatedArgs.y);
181
break;
182
case "scroll":
183
await this.computer.scroll(
184
validatedArgs.x,
185
validatedArgs.y,
186
this.toNumber(validatedArgs.scroll_x),
187
this.toNumber(validatedArgs.scroll_y)
188
);
189
break;
190
case "drag":
191
const path = validatedArgs.path || [];
192
await this.computer.drag(path);
193
break;
194
case "type":
195
await this.computer.type(validatedArgs.text || "");
196
break;
197
case "keypress":
198
await this.computer.keypress(validatedArgs.keys || []);
199
break;
200
case "wait":
201
await this.computer.wait(this.toNumber(validatedArgs.ms) || 1000);
202
break;
203
case "goto":
204
await this.computer.goto(validatedArgs.url || "");
205
break;
206
case "back":
207
await this.computer.back();
208
break;
209
case "forward":
210
await this.computer.forward();
211
break;
212
case "screenshot":
213
break;
214
default:
215
const method = (this.computer as any)
216
[actionType];
217
if (typeof method === "function") {
218
await method.call(this.computer, ...Object.values(validatedArgs));
219
}
220
break;
221
}
222
}
223
224
async handleItem(
225
item: MessageItem | FunctionCallItem | ComputerCallItem
226
): Promise<OutputItem[]> {
227
if (item.type === "message") {
228
if (this.printSteps) {
229
console.log(item.content[0].text);
230
}
231
} else if (item.type === "function_call") {
232
const { name, arguments: argsStr } = item;
233
const args = JSON.parse(argsStr);
234
235
if (this.printSteps) {
236
console.log(`${name}(${JSON.stringify(args)})`);
237
}
238
239
if (typeof (this.computer as any)
240
[name] === "function") {
241
const method = (this.computer as any)
242
[name];
243
await method.call(this.computer, ...Object.values(args));
244
}
245
246
return [
247
{
248
type: "function_call_output",
249
call_id: item.call_id,
250
output: "success",
251
},
252
];
253
} else if (item.type === "computer_call") {
254
const { action } = item;
255
const actionType = action.type;
256
const { type, ...actionArgs } = action;
257
258
if (this.printSteps) {
259
console.log(`${actionType}(${JSON.stringify(actionArgs)})`);
260
}
261
262
await this.executeAction(actionType, actionArgs);
263
const screenshotBase64 = await this.computer.screenshot();
264
265
// Validate screenshot dimensions for debugging
266
await this.validateScreenshotDimensions(screenshotBase64);
267
268
const pendingChecks = item.pending_safety_checks || [];
269
for (const check of pendingChecks) {
270
if (this.autoAcknowledgeSafety) {
271
console.log(`⚠️ Auto-acknowledging safety check: ${check.message}`);
272
} else {
273
throw new Error(`Safety check failed: ${check.message}`);
274
}
275
}
276
277
const callOutput: OutputItem = {
278
type: "computer_call_output",
279
call_id: item.call_id,
280
acknowledged_safety_checks: pendingChecks,
281
output: {
282
type: "input_image",
283
image_url: `data:image/png;base64,${screenshotBase64}`,
284
},
285
};
286
287
if (this.computer.getEnvironment() === "browser") {
288
const currentUrl = this.computer.getCurrentUrl();
289
checkBlocklistedUrl(currentUrl);
290
(callOutput.output as any).current_url = currentUrl;
291
}
292
293
return [callOutput];
294
}
295
296
return [];
297
}
298
299
async executeTask(
300
task: string,
301
printSteps: boolean = true,
302
debug: boolean = false,
303
maxIterations: number = 50
304
): Promise<string> {
305
this.printSteps = printSteps;
306
this.debug = debug;
307
this.showImages = false;
308
309
const inputItems = [
310
{
311
role: "system",
312
content: this.systemPrompt,
313
},
314
{
315
role: "user",
316
content: task,
317
},
318
];
319
320
let newItems: any[] = [];
321
let iterations = 0;
322
let consecutiveNoActions = 0;
323
let lastAssistantMessages: string[] = [];
324
325
console.log(`🎯 Executing task: ${task}`);
326
console.log("=".repeat(60));
327
328
const isTaskComplete = (
329
content: string
330
): { completed: boolean; reason?: string } => {
331
const lowerContent = content.toLowerCase();
332
333
if (content.includes("TASK_COMPLETED:")) {
334
return { completed: true, reason: "explicit_completion" };
335
}
336
if (
337
content.includes("TASK_FAILED:") ||
338
content.includes("TASK_ABANDONED:")
339
) {
340
return { completed: true, reason: "explicit_failure" };
341
}
342
343
const completionPatterns = [
344
/task\s+(completed|finished|done|accomplished)/i,
345
/successfully\s+(completed|finished|found|gathered)/i,
346
/here\s+(is|are)\s+the\s+(results?|information|summary)/i,
347
/to\s+summarize/i,
348
/in\s+conclusion/i,
349
/final\s+(answer|result|summary)/i,
350
];
351
352
const failurePatterns = [
353
/cannot\s+(complete|proceed|access|continue)/i,
354
/unable\s+to\s+(complete|access|find|proceed)/i,
355
/blocked\s+by\s+(captcha|security|authentication)/i,
356
/giving\s+up/i,
357
/no\s+longer\s+able/i,
358
/have\s+tried\s+multiple\s+approaches/i,
359
];
360
361
if (completionPatterns.some((pattern) => pattern.test(content))) {
362
return { completed: true, reason: "natural_completion" };
363
}
364
365
if (failurePatterns.some((pattern) => pattern.test(content))) {
366
return { completed: true, reason: "natural_failure" };
367
}
368
369
return { completed: false };
370
};
371
372
const detectRepetition = (newMessage: string): boolean => {
373
if (lastAssistantMessages.length < 2) return false;
374
375
const similarity = (str1: string, str2: string): number => {
376
const words1 = str1.toLowerCase().split(/\s+/);
377
const words2 = str2.toLowerCase().split(/\s+/);
378
const commonWords = words1.filter((word) => words2.includes(word));
379
return commonWords.length / Math.max(words1.length, words2.length);
380
};
381
382
return lastAssistantMessages.some(
383
(prevMessage) => similarity(newMessage, prevMessage) > 0.8
384
);
385
};
386
387
while (iterations < maxIterations) {
388
iterations++;
389
let hasActions = false;
390
391
if (
392
newItems.length > 0 &&
393
newItems[newItems.length - 1]?.role === "assistant"
394
) {
395
const lastMessage = newItems[newItems.length - 1];
396
if (lastMessage.content?.[0]?.text) {
397
const content = lastMessage.content[0].text;
398
399
const completion = isTaskComplete(content);
400
if (completion.completed) {
401
console.log(`✅ Task completed (${completion.reason})`);
402
break;
403
}
404
405
if (detectRepetition(content)) {
406
console.log("🔄 Repetition detected - stopping execution");
407
lastAssistantMessages.push(content);
408
break;
409
}
410
411
lastAssistantMessages.push(content);
412
if (lastAssistantMessages.length > 3) {
413
lastAssistantMessages.shift(); // Keep only last 3
414
}
415
}
416
}
417
418
this.debugPrint([...inputItems, ...newItems].map(sanitizeMessage));
419
420
try {
421
const response = await createResponse({
422
model: this.model,
423
input: [...inputItems, ...newItems],
424
tools: this.tools,
425
truncation: "auto",
426
});
427
428
this.debugPrint(response);
429
430
if (!response.output) {
431
if (this.debug) {
432
console.log(response);
433
}
434
throw new Error("No output from model");
435
}
436
437
newItems.push(...response.output);
438
439
for (const item of response.output) {
440
if (item.type === "computer_call" || item.type === "function_call") {
441
hasActions = true;
442
}
443
const handleResult = await this.handleItem(item);
444
newItems.push(...handleResult);
445
}
446
447
if (!hasActions) {
448
consecutiveNoActions++;
449
if (consecutiveNoActions >= 3) {
450
console.log(
451
"⚠️ No actions for 3 consecutive iterations - stopping"
452
);
453
break;
454
}
455
} else {
456
consecutiveNoActions = 0;
457
}
458
} catch (error) {
459
console.error(`❌ Error during task execution: ${error}`);
460
throw error;
461
}
462
}
463
464
if (iterations >= maxIterations) {
465
console.warn(
466
`⚠️ Task execution stopped after ${maxIterations} iterations`
467
);
468
}
469
470
const assistantMessages = newItems.filter(
471
(item) => item.role === "assistant"
472
);
473
const finalMessage = assistantMessages[assistantMessages.length - 1];
474
475
return (
476
finalMessage?.content?.[0]?.text ||
477
"Task execution completed (no final message)"
478
);
479
}
480
}

Step 4: Create the Main Script

Typescript
index.ts
1
import { SteelBrowser } from "./steelBrowser";
2
import { Agent } from "./agent";
3
import { STEEL_API_KEY, OPENAI_API_KEY, TASK } from "./helpers";
4
5
async function main(): Promise<void> {
6
console.log("🚀 Steel + OpenAI Computer Use Assistant");
7
console.log("=".repeat(60));
8
9
if (STEEL_API_KEY === "your-steel-api-key-here") {
10
console.warn(
11
"⚠️ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"
12
);
13
console.warn(
14
" Get your API key at: https://app.steel.dev/settings/api-keys"
15
);
16
return;
17
}
18
19
if (OPENAI_API_KEY === "your-openai-api-key-here") {
20
console.warn(
21
"⚠️ WARNING: Please replace 'your-openai-api-key-here' with your actual OpenAI API key"
22
);
23
console.warn(" Get your API key at: https://platform.openai.com/");
24
return;
25
}
26
27
console.log("\nStarting Steel browser session...");
28
29
const computer = new SteelBrowser();
30
31
try {
32
await computer.initialize();
33
console.log("✅ Steel browser session started!");
34
35
const agent = new Agent("computer-use-preview", computer, [], true);
36
37
const startTime = Date.now();
38
39
try {
40
const result = await agent.executeTask(TASK, true, false, 50);
41
42
const duration = ((Date.now() - startTime) / 1000).toFixed(1);
43
44
console.log("\n" + "=".repeat(60));
45
console.log("🎉 TASK EXECUTION COMPLETED");
46
console.log("=".repeat(60));
47
console.log(`⏱️ Duration: ${duration} seconds`);
48
console.log(`🎯 Task: ${TASK}`);
49
console.log(`📋 Result:\n${result}`);
50
console.log("=".repeat(60));
51
} catch (error) {
52
console.error(`❌ Task execution failed: ${error}`);
53
process.exit(1);
54
}
55
} catch (error) {
56
console.log(`❌ Failed to start Steel browser: ${error}`);
57
console.log("Please check your STEEL_API_KEY and internet connection.");
58
process.exit(1);
59
} finally {
60
await computer.cleanup();
61
}
62
}
63
64
main().catch(console.error);

Running Your Agent

Execute your script to start an interactive AI browser session:

The agent will execute the task defined in the TASK environment variable or the default task. You can modify the task by setting the environment variable:

Terminal
export TASK="Research the top 5 electric vehicles with the longest range"
npm start

You'll see each action the agent takes displayed in the console, and you can view the live browser session by opening the session URL in your web browser.

Next Steps

  • Explore the Steel API documentation for more advanced features

  • Check out the OpenAI documentation for more information about the computer-use-preview model

  • Add additional features like session recording or multi-session management

  • Add additional features like session recording or multi-session management