Quickstart (Typescript)

How to use Claude Computer Use with Steel

This guide shows you how to create AI agents with Claude's computer use capabilities and Steel browsers for autonomous web task execution.

Prerequisites

  • Node.js 20+

  • A Steel API key (sign up here)

  • An Anthropic API key with access to Claude models

Step 1: Setup and Dependencies

First, create a project directory and install the required packages:

Terminal
# Create a project directory
mkdir steel-claude-computer-use
cd steel-claude-computer-use
# Initialize package.json
npm init -y
# Install required packages
npm install steel-sdk @anthropic-ai/sdk playwright dotenv
npm install -D @types/node typescript ts-node

Create a .env file with your API keys:

ENV
.env
1
STEEL_API_KEY=your_steel_api_key_here
2
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3
TASK=Go to Wikipedia and search for machine learning

Step 2: Create Helper Functions

Typescript
utils.ts
1
import { chromium } from "playwright";
2
import type { Browser, Page } from "playwright";
3
import { Steel } from "steel-sdk";
4
import * as dotenv from "dotenv";
5
import Anthropic from "@anthropic-ai/sdk";
6
import type {
7
MessageParam,
8
ToolResultBlockParam,
9
Message,
10
} from "@anthropic-ai/sdk/resources/messages";
11
12
dotenv.config();
13
14
// Replace with your own API keys
15
export const STEEL_API_KEY =
16
process.env.STEEL_API_KEY || "your-steel-api-key-here";
17
export const ANTHROPIC_API_KEY =
18
process.env.ANTHROPIC_API_KEY || "your-anthropic-api-key-here";
19
20
// Replace with your own task
21
export const TASK =
22
process.env.TASK || "Go to Wikipedia and search for machine learning";
23
24
export const SYSTEM_PROMPT = `You are an expert browser automation assistant operating in an iterative execution loop. Your goal is to efficiently complete tasks using a Chrome browser with full internet access.
25
26
<CAPABILITIES>
27
* You control a Chrome browser tab and can navigate to any website
28
* You can click, type, scroll, take screenshots, and interact with web elements
29
* You have full internet access and can visit any public website
30
* You can read content, fill forms, search for information, and perform complex multi-step tasks
31
* After each action, you receive a screenshot showing the current state
32
33
<COORDINATE_SYSTEM>
34
* The browser viewport has specific dimensions that you must respect
35
* All coordinates (x, y) must be within the viewport bounds
36
* X coordinates must be between 0 and the display width (inclusive)
37
* Y coordinates must be between 0 and the display height (inclusive)
38
* Always ensure your click, move, scroll, and drag coordinates are within these bounds
39
* If you're unsure about element locations, take a screenshot first to see the current state
40
41
<AUTONOMOUS_EXECUTION>
42
* Work completely independently - make decisions and act immediately without asking questions
43
* Never request clarification, present options, or ask for permission
44
* Make intelligent assumptions based on task context
45
* If something is ambiguous, choose the most logical interpretation and proceed
46
* Take immediate action rather than explaining what you might do
47
* When the task objective is achieved, immediately declare "TASK_COMPLETED:" - do not provide commentary or ask questions
48
49
<REASONING_STRUCTURE>
50
For each step, you must reason systematically:
51
* Analyze your previous action's success/failure and current state
52
* Identify what specific progress has been made toward the goal
53
* Determine the next immediate objective and how to achieve it
54
* Choose the most efficient action sequence to make progress
55
56
<EFFICIENCY_PRINCIPLES>
57
* Combine related actions when possible rather than single-step execution
58
* Navigate directly to relevant websites without unnecessary exploration
59
* Use screenshots strategically to understand page state before acting
60
* Be persistent with alternative approaches if initial attempts fail
61
* Focus on the specific information or outcome requested
62
63
<COMPLETION_CRITERIA>
64
* MANDATORY: When you complete the task, your final message MUST start with "TASK_COMPLETED: [brief summary]"
65
* MANDATORY: If technical issues prevent completion, your final message MUST start with "TASK_FAILED: [reason]"
66
* MANDATORY: If you abandon the task, your final message MUST start with "TASK_ABANDONED: [explanation]"
67
* Do not write anything after completing the task except the required completion message
68
* Do not ask questions, provide commentary, or offer additional help after task completion
69
* The completion message is the end of the interaction - nothing else should follow
70
71
<CRITICAL_REQUIREMENTS>
72
* This is fully automated execution - work completely independently
73
* Start by taking a screenshot to understand the current state
74
* Never click on browser UI elements
75
* Always respect coordinate boundaries - invalid coordinates will fail
76
* Recognize when the stated objective has been achieved and declare completion immediately
77
* Focus on the explicit task given, not implied or potential follow-up tasks
78
79
Remember: Be thorough but focused. Complete the specific task requested efficiently and provide clear results.`;
80
81
export const BLOCKED_DOMAINS = [
82
"maliciousbook.com",
83
"evilvideos.com",
84
"darkwebforum.com",
85
"shadytok.com",
86
"suspiciouspins.com",
87
"ilanbigio.com",
88
];
89
90
export const MODEL_CONFIGS = {
91
"claude-3-5-sonnet-20241022": {
92
toolType: "computer_20241022",
93
betaFlag: "computer-use-2024-10-22",
94
description: "Stable Claude 3.5 Sonnet (recommended)",
95
},
96
"claude-3-7-sonnet-20250219": {
97
toolType: "computer_20250124",
98
betaFlag: "computer-use-2025-01-24",
99
description: "Claude 3.7 Sonnet (newer)",
100
},
101
"claude-sonnet-4-20250514": {
102
toolType: "computer_20250124",
103
betaFlag: "computer-use-2025-01-24",
104
description: "Claude 4 Sonnet (newest)",
105
},
106
"claude-opus-4-20250514": {
107
toolType: "computer_20250124",
108
betaFlag: "computer-use-2025-01-24",
109
description: "Claude 4 Opus (newest)",
110
},
111
};
112
113
export const CUA_KEY_TO_PLAYWRIGHT_KEY: Record<string, string> = {
114
"/": "Divide",
115
"\\": "Backslash",
116
alt: "Alt",
117
arrowdown: "ArrowDown",
118
arrowleft: "ArrowLeft",
119
arrowright: "ArrowRight",
120
arrowup: "ArrowUp",
121
backspace: "Backspace",
122
capslock: "CapsLock",
123
cmd: "Meta",
124
ctrl: "Control",
125
delete: "Delete",
126
end: "End",
127
enter: "Enter",
128
esc: "Escape",
129
home: "Home",
130
insert: "Insert",
131
option: "Alt",
132
pagedown: "PageDown",
133
pageup: "PageUp",
134
shift: "Shift",
135
space: " ",
136
super: "Meta",
137
tab: "Tab",
138
win: "Meta",
139
Return: "Enter",
140
KP_Enter: "Enter",
141
Escape: "Escape",
142
BackSpace: "Backspace",
143
Delete: "Delete",
144
Tab: "Tab",
145
ISO_Left_Tab: "Shift+Tab",
146
Up: "ArrowUp",
147
Down: "ArrowDown",
148
Left: "ArrowLeft",
149
Right: "ArrowRight",
150
Page_Up: "PageUp",
151
Page_Down: "PageDown",
152
Home: "Home",
153
End: "End",
154
Insert: "Insert",
155
F1: "F1",
156
F2: "F2",
157
F3: "F3",
158
F4: "F4",
159
F5: "F5",
160
F6: "F6",
161
F7: "F7",
162
F8: "F8",
163
F9: "F9",
164
F10: "F10",
165
F11: "F11",
166
F12: "F12",
167
Shift_L: "Shift",
168
Shift_R: "Shift",
169
Control_L: "Control",
170
Control_R: "Control",
171
Alt_L: "Alt",
172
Alt_R: "Alt",
173
Meta_L: "Meta",
174
Meta_R: "Meta",
175
Super_L: "Meta",
176
Super_R: "Meta",
177
minus: "-",
178
equal: "=",
179
bracketleft: "[",
180
bracketright: "]",
181
semicolon: ";",
182
apostrophe: "'",
183
grave: "`",
184
comma: ",",
185
period: ".",
186
slash: "/",
187
};
188
189
type ModelName = keyof typeof MODEL_CONFIGS;
190
191
interface ModelConfig {
192
toolType: string;
193
betaFlag: string;
194
description: string;
195
}
196
197
export function chunks(s: string, chunkSize: number): string[] {
198
const result: string[] = [];
199
for (let i = 0; i < s.length; i += chunkSize) {
200
result.push(s.slice(i, i + chunkSize));
201
}
202
return result;
203
}
204
205
export function pp(obj: any): void {
206
console.log(JSON.stringify(obj, null, 2));
207
}
208
209
export function checkBlocklistedUrl(url: string): void {
210
try {
211
const hostname = new URL(url).hostname || "";
212
const isBlocked = BLOCKED_DOMAINS.some(
213
(blocked) => hostname === blocked || hostname.endsWith(`.${blocked}`)
214
);
215
if (isBlocked) {
216
throw new Error(`Blocked URL: ${url}`);
217
}
218
} catch (error) {
219
if (error instanceof Error && error.message.startsWith("Blocked URL:")) {
220
throw error;
221
}
222
}
223
}

Step 3: Create Steel Browser Integration

Typescript
steelBrowser.ts
1
const TYPING_DELAY_MS = 12;
2
const TYPING_GROUP_SIZE = 50;
3
4
export class SteelBrowser {
5
private client: Steel;
6
private session: any;
7
private browser: Browser | null = null;
8
private page: Page | null = null;
9
private dimensions: [number, number];
10
private proxy: boolean;
11
private solveCaptcha: boolean;
12
private virtualMouse: boolean;
13
private sessionTimeout: number;
14
private adBlocker: boolean;
15
private startUrl: string;
16
private lastMousePosition: [number, number] | null = null;
17
18
constructor(
19
width: number = 1024,
20
height: number = 768,
21
proxy: boolean = false,
22
solveCaptcha: boolean = false,
23
virtualMouse: boolean = true,
24
sessionTimeout: number = 900000,
25
adBlocker: boolean = true,
26
startUrl: string = "https://www.google.com"
27
) {
28
this.client = new Steel({
29
steelAPIKey: process.env.STEEL_API_KEY!,
30
});
31
this.dimensions = [width, height];
32
this.proxy = proxy;
33
this.solveCaptcha = solveCaptcha;
34
this.virtualMouse = virtualMouse;
35
this.sessionTimeout = sessionTimeout;
36
this.adBlocker = adBlocker;
37
this.startUrl = startUrl;
38
}
39
40
getDimensions(): [number, number] {
41
return this.dimensions;
42
}
43
44
getCurrentUrl(): string {
45
return this.page?.url() || "";
46
}
47
48
async initialize(): Promise<void> {
49
const [width, height] = this.dimensions;
50
const sessionParams = {
51
useProxy: this.proxy,
52
solveCaptcha: this.solveCaptcha,
53
apiTimeout: this.sessionTimeout,
54
blockAds: this.adBlocker,
55
dimensions: { width, height },
56
};
57
58
this.session = await this.client.sessions.create(sessionParams);
59
console.log("Steel Session created successfully!");
60
console.log(`View live session at: ${this.session.sessionViewerUrl}`);
61
62
const cdpUrl = `${this.session.websocketUrl}&apiKey=${process.env.STEEL_API_KEY}`;
63
64
this.browser = await chromium.connectOverCDP(cdpUrl, {
65
timeout: 60000,
66
});
67
68
const context = this.browser.contexts()
69
[0];
70
71
await context.route("**/*", async (route, request) => {
72
const url = request.url();
73
try {
74
checkBlocklistedUrl(url);
75
await route.continue();
76
} catch (error) {
77
console.log(`Blocking URL: ${url}`);
78
await route.abort();
79
}
80
});
81
82
if (this.virtualMouse) {
83
await context.addInitScript(`
84
if (window.self === window.top) {
85
function initCursor() {
86
const CURSOR_ID = '__cursor__';
87
if (document.getElementById(CURSOR_ID)) return;
88
89
const cursor = document.createElement('div');
90
cursor.id = CURSOR_ID;
91
Object.assign(cursor.style, {
92
position: 'fixed',
93
top: '0px',
94
left: '0px',
95
width: '20px',
96
height: '20px',
97
backgroundImage: 'url("data:image/svg+xml;utf8,<svg width=\\'16\\' height=\\'16\\' viewBox=\\'0 0 20 20\\' fill=\\'black\\' outline=\\'white\\' xmlns=\\'http://www.w3.org/2000/svg\\'><path d=\\'M15.8089 7.22221C15.9333 7.00888 15.9911 6.78221 15.9822 6.54221C15.9733 6.29333 15.8978 6.06667 15.7555 5.86221C15.6133 5.66667 15.4311 5.52445 15.2089 5.43555L1.70222 0.0888888C1.47111 0 1.23555 -0.0222222 0.995555 0.0222222C0.746667 0.0755555 0.537779 0.186667 0.368888 0.355555C0.191111 0.533333 0.0755555 0.746667 0.0222222 0.995555C-0.0222222 1.23555 0 1.47111 0.0888888 1.70222L5.43555 15.2222C5.52445 15.4445 5.66667 15.6267 5.86221 15.7689C6.06667 15.9111 6.28888 15.9867 6.52888 15.9955H6.58221C6.82221 15.9955 7.04445 15.9333 7.24888 15.8089C7.44445 15.6845 7.59555 15.52 7.70221 15.3155L10.2089 10.2222L15.3022 7.70221C15.5155 7.59555 15.6845 7.43555 15.8089 7.22221Z\\' ></path></svg>")',
98
backgroundSize: 'cover',
99
pointerEvents: 'none',
100
zIndex: '99999',
101
transform: 'translate(-2px, -2px)',
102
});
103
104
document.body.appendChild(cursor);
105
106
document.addEventListener("mousemove", (e) => {
107
cursor.style.top = e.clientY + "px";
108
cursor.style.left = e.clientX + "px";
109
});
110
}
111
112
function checkBody() {
113
if (document.body) {
114
initCursor();
115
} else {
116
requestAnimationFrame(checkBody);
117
}
118
}
119
requestAnimationFrame(checkBody);
120
}
121
`);
122
}
123
124
this.page = context.pages()
125
[0];
126
127
const [viewportWidth, viewportHeight] = this.dimensions;
128
await this.page.setViewportSize({
129
width: viewportWidth,
130
height: viewportHeight,
131
});
132
133
await this.page.goto(this.startUrl);
134
}
135
136
async cleanup(): Promise<void> {
137
if (this.page) {
138
await this.page.close();
139
}
140
if (this.browser) {
141
await this.browser.close();
142
}
143
if (this.session) {
144
console.log("Releasing Steel session...");
145
await this.client.sessions.release(this.session.id);
146
console.log(
147
`Session completed. View replay at ${this.session.sessionViewerUrl}`
148
);
149
}
150
}
151
152
async screenshot(): Promise<string> {
153
if (!this.page) throw new Error("Page not initialized");
154
155
try {
156
const [width, height] = this.dimensions;
157
const buffer = await this.page.screenshot({
158
fullPage: false,
159
clip: { x: 0, y: 0, width, height },
160
});
161
return buffer.toString("base64");
162
} catch (error) {
163
console.log(`Screenshot failed, trying CDP fallback: ${error}`);
164
try {
165
const cdpSession = await this.page.context().newCDPSession(this.page);
166
const result = await cdpSession.send("Page.captureScreenshot", {
167
format: "png",
168
fromSurface: false,
169
});
170
await cdpSession.detach();
171
return result.data;
172
} catch (cdpError) {
173
console.log(`CDP screenshot also failed: ${cdpError}`);
174
throw error;
175
}
176
}
177
}
178
179
private validateAndGetCoordinates(
180
coordinate: [number, number] | number[]
181
): [number, number] {
182
if (!Array.isArray(coordinate) || coordinate.length !== 2) {
183
throw new Error(`${coordinate} must be a tuple or list of length 2`);
184
}
185
if (!coordinate.every((i) => typeof i === "number" && i >= 0)) {
186
throw new Error(
187
`${coordinate} must be a tuple/list of non-negative numbers`
188
);
189
}
190
191
const [x, y] = this.clampCoordinates(coordinate[0], coordinate[1]);
192
return [x, y];
193
}
194
195
private clampCoordinates(x: number, y: number): [number, number] {
196
const [width, height] = this.dimensions;
197
const clampedX = Math.max(0, Math.min(x, width - 1));
198
const clampedY = Math.max(0, Math.min(y, height - 1));
199
200
if (x !== clampedX || y !== clampedY) {
201
console.log(
202
`⚠️ Coordinate clamped: (${x}, ${y}) → (${clampedX}, ${clampedY})`
203
);
204
}
205
206
return [clampedX, clampedY];
207
}
208
209
async executeComputerAction(
210
action: string,
211
text?: string,
212
coordinate?: [number, number] | number[],
213
scrollDirection?: "up" | "down" | "left" | "right",
214
scrollAmount?: number,
215
duration?: number,
216
key?: string
217
): Promise<string> {
218
if (!this.page) throw new Error("Page not initialized");
219
220
if (action === "left_mouse_down" || action === "left_mouse_up") {
221
if (coordinate !== undefined) {
222
throw new Error(`coordinate is not accepted for ${action}`);
223
}
224
225
if (action === "left_mouse_down") {
226
await this.page.mouse.down();
227
} else {
228
await this.page.mouse.up();
229
}
230
231
return this.screenshot();
232
}
233
234
if (action === "scroll") {
235
if (
236
!scrollDirection ||
237
!["up", "down", "left", "right"].includes(scrollDirection)
238
) {
239
throw new Error(
240
"scroll_direction must be 'up', 'down', 'left', or 'right'"
241
);
242
}
243
if (scrollAmount === undefined || scrollAmount < 0) {
244
throw new Error("scroll_amount must be a non-negative number");
245
}
246
247
if (coordinate !== undefined) {
248
const [x, y] = this.validateAndGetCoordinates(coordinate);
249
await this.page.mouse.move(x, y);
250
this.lastMousePosition = [x, y];
251
}
252
253
if (text) {
254
let modifierKey = text;
255
if (modifierKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
256
modifierKey = CUA_KEY_TO_PLAYWRIGHT_KEY[modifierKey];
257
}
258
await this.page.keyboard.down(modifierKey);
259
}
260
261
const scrollMapping = {
262
down: [0, 100 * scrollAmount],
263
up: [0, -100 * scrollAmount],
264
right: [100 * scrollAmount, 0],
265
left: [-100 * scrollAmount, 0],
266
};
267
const [deltaX, deltaY] = scrollMapping[scrollDirection];
268
await this.page.mouse.wheel(deltaX, deltaY);
269
270
if (text) {
271
let modifierKey = text;
272
if (modifierKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
273
modifierKey = CUA_KEY_TO_PLAYWRIGHT_KEY[modifierKey];
274
}
275
await this.page.keyboard.up(modifierKey);
276
}
277
278
return this.screenshot();
279
}
280
281
if (action === "hold_key" || action === "wait") {
282
if (duration === undefined || duration < 0) {
283
throw new Error("duration must be a non-negative number");
284
}
285
if (duration > 100) {
286
throw new Error("duration is too long");
287
}
288
289
if (action === "hold_key") {
290
if (text === undefined) {
291
throw new Error("text is required for hold_key");
292
}
293
294
let holdKey = text;
295
if (holdKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
296
holdKey = CUA_KEY_TO_PLAYWRIGHT_KEY[holdKey];
297
}
298
299
await this.page.keyboard.down(holdKey);
300
await new Promise((resolve) => setTimeout(resolve, duration * 1000));
301
await this.page.keyboard.up(holdKey);
302
} else if (action === "wait") {
303
await new Promise((resolve) => setTimeout(resolve, duration * 1000));
304
}
305
306
return this.screenshot();
307
}
308
309
if (
310
[
311
"left_click",
312
"right_click",
313
"double_click",
314
"triple_click",
315
"middle_click",
316
].includes(action)
317
) {
318
if (text !== undefined) {
319
throw new Error(`text is not accepted for ${action}`);
320
}
321
322
let clickX: number, clickY: number;
323
if (coordinate !== undefined) {
324
const [x, y] = this.validateAndGetCoordinates(coordinate);
325
await this.page.mouse.move(x, y);
326
this.lastMousePosition = [x, y];
327
clickX = x;
328
clickY = y;
329
} else if (this.lastMousePosition) {
330
[clickX, clickY] = this.lastMousePosition;
331
} else {
332
const [width, height] = this.dimensions;
333
clickX = Math.floor(width / 2);
334
clickY = Math.floor(height / 2);
335
}
336
337
if (key) {
338
let modifierKey = key;
339
if (modifierKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
340
modifierKey = CUA_KEY_TO_PLAYWRIGHT_KEY[modifierKey];
341
}
342
await this.page.keyboard.down(modifierKey);
343
}
344
345
if (action === "left_click") {
346
await this.page.mouse.click(clickX, clickY);
347
} else if (action === "right_click") {
348
await this.page.mouse.click(clickX, clickY, { button: "right" });
349
} else if (action === "double_click") {
350
await this.page.mouse.dblclick(clickX, clickY);
351
} else if (action === "triple_click") {
352
for (let i = 0; i < 3; i++) {
353
await this.page.mouse.click(clickX, clickY);
354
}
355
} else if (action === "middle_click") {
356
await this.page.mouse.click(clickX, clickY, { button: "middle" });
357
}
358
359
if (key) {
360
let modifierKey = key;
361
if (modifierKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
362
modifierKey = CUA_KEY_TO_PLAYWRIGHT_KEY[modifierKey];
363
}
364
await this.page.keyboard.up(modifierKey);
365
}
366
367
return this.screenshot();
368
}
369
370
if (action === "mouse_move" || action === "left_click_drag") {
371
if (coordinate === undefined) {
372
throw new Error(`coordinate is required for ${action}`);
373
}
374
if (text !== undefined) {
375
throw new Error(`text is not accepted for ${action}`);
376
}
377
378
const [x, y] = this.validateAndGetCoordinates(coordinate);
379
380
if (action === "mouse_move") {
381
await this.page.mouse.move(x, y);
382
this.lastMousePosition = [x, y];
383
} else if (action === "left_click_drag") {
384
await this.page.mouse.down();
385
await this.page.mouse.move(x, y);
386
await this.page.mouse.up();
387
this.lastMousePosition = [x, y];
388
}
389
390
return this.screenshot();
391
}
392
393
if (action === "key" || action === "type") {
394
if (text === undefined) {
395
throw new Error(`text is required for ${action}`);
396
}
397
if (coordinate !== undefined) {
398
throw new Error(`coordinate is not accepted for ${action}`);
399
}
400
401
if (action === "key") {
402
let pressKey = text;
403
404
if (pressKey.includes("+")) {
405
const keyParts = pressKey.split("+");
406
const modifierKeys = keyParts.slice(0, -1);
407
const mainKey = keyParts[keyParts.length - 1];
408
409
const playwrightModifiers: string[] = [];
410
for (const mod of modifierKeys) {
411
if (["ctrl", "control"].includes(mod.toLowerCase())) {
412
playwrightModifiers.push("Control");
413
} else if (mod.toLowerCase() === "shift") {
414
playwrightModifiers.push("Shift");
415
} else if (["alt", "option"].includes(mod.toLowerCase())) {
416
playwrightModifiers.push("Alt");
417
} else if (["cmd", "meta", "super"].includes(mod.toLowerCase())) {
418
playwrightModifiers.push("Meta");
419
} else {
420
playwrightModifiers.push(mod);
421
}
422
}
423
424
let finalMainKey = mainKey;
425
if (mainKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
426
finalMainKey = CUA_KEY_TO_PLAYWRIGHT_KEY[mainKey];
427
}
428
429
pressKey = [...playwrightModifiers, finalMainKey].join("+");
430
} else {
431
if (pressKey in CUA_KEY_TO_PLAYWRIGHT_KEY) {
432
pressKey = CUA_KEY_TO_PLAYWRIGHT_KEY[pressKey];
433
}
434
}
435
436
await this.page.keyboard.press(pressKey);
437
} else if (action === "type") {
438
for (const chunk of chunks(text, TYPING_GROUP_SIZE)) {
439
await this.page.keyboard.type(chunk, { delay: TYPING_DELAY_MS });
440
await new Promise((resolve) => setTimeout(resolve, 10));
441
}
442
}
443
444
return this.screenshot();
445
}
446
447
if (action === "screenshot" || action === "cursor_position") {
448
if (text !== undefined) {
449
throw new Error(`text is not accepted for ${action}`);
450
}
451
if (coordinate !== undefined) {
452
throw new Error(`coordinate is not accepted for ${action}`);
453
}
454
455
return this.screenshot();
456
}
457
458
throw new Error(`Invalid action: ${action}`);
459
}
460
}

Step 4: Create the Agent Class

Typescript
claudeAgent.ts
1
type ModelName = keyof typeof MODEL_CONFIGS;
2
3
interface ModelConfig {
4
toolType: string;
5
betaFlag: string;
6
description: string;
7
}
8
9
export class ClaudeAgent {
10
private client: Anthropic;
11
private computer: SteelBrowser;
12
private messages: MessageParam[];
13
private model: ModelName;
14
private modelConfig: ModelConfig;
15
private tools: any[];
16
private systemPrompt: string;
17
private viewportWidth: number;
18
private viewportHeight: number;
19
20
constructor(
21
computer: SteelBrowser,
22
model: ModelName = "claude-3-5-sonnet-20241022"
23
) {
24
this.client = new Anthropic({
25
apiKey: process.env.ANTHROPIC_API_KEY!,
26
});
27
this.computer = computer;
28
this.model = model;
29
this.messages = [];
30
31
if (!(model in MODEL_CONFIGS)) {
32
throw new Error(
33
`Unsupported model: ${model}. Available models: ${Object.keys(
34
MODEL_CONFIGS
35
)}`
36
);
37
}
38
39
this.modelConfig = MODEL_CONFIGS[model];
40
41
const [width, height] = computer.getDimensions();
42
this.viewportWidth = width;
43
this.viewportHeight = height;
44
45
this.systemPrompt = SYSTEM_PROMPT.replace(
46
"<COORDINATE_SYSTEM>",
47
`<COORDINATE_SYSTEM>
48
* The browser viewport dimensions are ${width}x${height} pixels
49
* The browser viewport has specific dimensions that you must respect`
50
);
51
52
this.tools = [
53
{
54
type: this.modelConfig.toolType,
55
name: "computer",
56
display_width_px: width,
57
display_height_px: height,
58
display_number: 1,
59
},
60
];
61
}
62
63
getViewportInfo(): any {
64
return {
65
innerWidth: this.viewportWidth,
66
innerHeight: this.viewportHeight,
67
devicePixelRatio: 1.0,
68
screenWidth: this.viewportWidth,
69
screenHeight: this.viewportHeight,
70
scrollX: 0,
71
scrollY: 0,
72
};
73
}
74
75
validateScreenshotDimensions(screenshotBase64: string): any {
76
try {
77
const imageBuffer = Buffer.from(screenshotBase64, "base64");
78
79
if (imageBuffer.length === 0) {
80
console.log("⚠️ Empty screenshot data");
81
return {};
82
}
83
84
const viewportInfo = this.getViewportInfo();
85
86
const scalingInfo = {
87
screenshot_size: ["unknown", "unknown"],
88
viewport_size: [this.viewportWidth, this.viewportHeight],
89
actual_viewport: [viewportInfo.innerWidth, viewportInfo.innerHeight],
90
device_pixel_ratio: viewportInfo.devicePixelRatio,
91
width_scale: 1.0,
92
height_scale: 1.0,
93
};
94
95
return scalingInfo;
96
} catch (e) {
97
console.log(`⚠️ Error validating screenshot dimensions: ${e}`);
98
return {};
99
}
100
}
101
102
async processResponse(message: Message): Promise<string> {
103
let responseText = "";
104
105
for (const block of message.content) {
106
if (block.type === "text") {
107
responseText += block.text;
108
console.log(block.text);
109
} else if (block.type === "tool_use") {
110
const toolName = block.name;
111
const toolInput = block.input as any;
112
113
console.log(`🔧 ${toolName}(${JSON.stringify(toolInput)})`);
114
115
if (toolName === "computer") {
116
const action = toolInput.action;
117
const params = {
118
text: toolInput.text,
119
coordinate: toolInput.coordinate,
120
scrollDirection: toolInput.scroll_direction,
121
scrollAmount: toolInput.scroll_amount,
122
duration: toolInput.duration,
123
key: toolInput.key,
124
};
125
126
try {
127
const screenshotBase64 = await this.computer.executeComputerAction(
128
action,
129
params.text,
130
params.coordinate,
131
params.scrollDirection,
132
params.scrollAmount,
133
params.duration,
134
params.key
135
);
136
137
if (action === "screenshot") {
138
this.validateScreenshotDimensions(screenshotBase64);
139
}
140
141
const toolResult: ToolResultBlockParam = {
142
type: "tool_result",
143
tool_use_id: block.id,
144
content: [
145
{
146
type: "image",
147
source: {
148
type: "base64",
149
media_type: "image/png",
150
data: screenshotBase64,
151
},
152
},
153
],
154
};
155
156
this.messages.push({
157
role: "assistant",
158
content: [block],
159
});
160
this.messages.push({
161
role: "user",
162
content: [toolResult],
163
});
164
165
return this.getClaudeResponse();
166
} catch (error) {
167
console.log(`❌ Error executing ${action}: ${error}`);
168
const toolResult: ToolResultBlockParam = {
169
type: "tool_result",
170
tool_use_id: block.id,
171
content: `Error executing ${action}: ${String(error)}`,
172
is_error: true,
173
};
174
175
this.messages.push({
176
role: "assistant",
177
content: [block],
178
});
179
this.messages.push({
180
role: "user",
181
content: [toolResult],
182
});
183
184
return this.getClaudeResponse();
185
}
186
}
187
}
188
}
189
190
if (
191
responseText &&
192
!message.content.some((block) => block.type === "tool_use")
193
) {
194
this.messages.push({
195
role: "assistant",
196
content: responseText,
197
});
198
}
199
200
return responseText;
201
}
202
203
async getClaudeResponse(): Promise<string> {
204
try {
205
const response = await this.client.beta.messages.create(
206
{
207
model: this.model,
208
max_tokens: 4096,
209
messages: this.messages,
210
tools: this.tools,
211
},
212
{
213
headers: {
214
"anthropic-beta": this.modelConfig.betaFlag,
215
},
216
}
217
);
218
219
return this.processResponse(response);
220
} catch (error) {
221
const errorMsg = `Error communicating with Claude: ${error}`;
222
console.log(`❌ ${errorMsg}`);
223
return errorMsg;
224
}
225
}
226
227
async executeTask(
228
task: string,
229
printSteps: boolean = true,
230
debug: boolean = false,
231
maxIterations: number = 50
232
): Promise<string> {
233
this.messages = [
234
{
235
role: "user",
236
content: this.systemPrompt,
237
},
238
{
239
role: "user",
240
content: task,
241
},
242
];
243
244
let iterations = 0;
245
let consecutiveNoActions = 0;
246
let lastAssistantMessages: string[] = [];
247
248
console.log(`🎯 Executing task: ${task}`);
249
console.log("=".repeat(60));
250
251
const isTaskComplete = (
252
content: string
253
): { completed: boolean; reason?: string } => {
254
if (content.includes("TASK_COMPLETED:")) {
255
return { completed: true, reason: "explicit_completion" };
256
}
257
if (
258
content.includes("TASK_FAILED:") ||
259
content.includes("TASK_ABANDONED:")
260
) {
261
return { completed: true, reason: "explicit_failure" };
262
}
263
264
const completionPatterns = [
265
/task\s+(completed|finished|done|accomplished)/i,
266
/successfully\s+(completed|finished|found|gathered)/i,
267
/here\s+(is|are)\s+the\s+(results?|information|summary)/i,
268
/to\s+summarize/i,
269
/in\s+conclusion/i,
270
/final\s+(answer|result|summary)/i,
271
];
272
273
const failurePatterns = [
274
/cannot\s+(complete|proceed|access|continue)/i,
275
/unable\s+to\s+(complete|access|find|proceed)/i,
276
/blocked\s+by\s+(captcha|security|authentication)/i,
277
/giving\s+up/i,
278
/no\s+longer\s+able/i,
279
/have\s+tried\s+multiple\s+approaches/i,
280
];
281
282
if (completionPatterns.some((pattern) => pattern.test(content))) {
283
return { completed: true, reason: "natural_completion" };
284
}
285
286
if (failurePatterns.some((pattern) => pattern.test(content))) {
287
return { completed: true, reason: "natural_failure" };
288
}
289
290
return { completed: false };
291
};
292
293
const detectRepetition = (newMessage: string): boolean => {
294
if (lastAssistantMessages.length < 2) return false;
295
296
const similarity = (str1: string, str2: string): number => {
297
const words1 = str1.toLowerCase().split(/\s+/);
298
const words2 = str2.toLowerCase().split(/\s+/);
299
const commonWords = words1.filter((word) => words2.includes(word));
300
return commonWords.length / Math.max(words1.length, words2.length);
301
};
302
303
return lastAssistantMessages.some(
304
(prevMessage) => similarity(newMessage, prevMessage) > 0.8
305
);
306
};
307
308
while (iterations < maxIterations) {
309
iterations++;
310
let hasActions = false;
311
312
if (this.messages.length > 0) {
313
const lastMessage = this.messages[this.messages.length - 1];
314
if (
315
lastMessage?.role === "assistant" &&
316
typeof lastMessage.content === "string"
317
) {
318
const content = lastMessage.content;
319
320
const completion = isTaskComplete(content);
321
if (completion.completed) {
322
console.log(`✅ Task completed (${completion.reason})`);
323
break;
324
}
325
326
if (detectRepetition(content)) {
327
console.log("🔄 Repetition detected - stopping execution");
328
lastAssistantMessages.push(content);
329
break;
330
}
331
332
lastAssistantMessages.push(content);
333
if (lastAssistantMessages.length > 3) {
334
lastAssistantMessages.shift();
335
}
336
}
337
}
338
339
if (debug) {
340
pp(this.messages);
341
}
342
343
try {
344
const response = await this.client.beta.messages.create(
345
{
346
model: this.model,
347
max_tokens: 4096,
348
messages: this.messages,
349
tools: this.tools,
350
},
351
{
352
headers: {
353
"anthropic-beta": this.modelConfig.betaFlag,
354
},
355
}
356
);
357
358
if (debug) {
359
pp(response);
360
}
361
362
for (const block of response.content) {
363
if (block.type === "tool_use") {
364
hasActions = true;
365
}
366
}
367
368
await this.processResponse(response);
369
370
if (!hasActions) {
371
consecutiveNoActions++;
372
if (consecutiveNoActions >= 3) {
373
console.log(
374
"⚠️ No actions for 3 consecutive iterations - stopping"
375
);
376
break;
377
}
378
} else {
379
consecutiveNoActions = 0;
380
}
381
} catch (error) {
382
console.error(`❌ Error during task execution: ${error}`);
383
throw error;
384
}
385
}
386
387
if (iterations >= maxIterations) {
388
console.warn(
389
`⚠️ Task execution stopped after ${maxIterations} iterations`
390
);
391
}
392
393
const assistantMessages = this.messages.filter(
394
(item) => item.role === "assistant"
395
);
396
const finalMessage = assistantMessages[assistantMessages.length - 1];
397
398
if (finalMessage && typeof finalMessage.content === "string") {
399
return finalMessage.content;
400
}
401
402
return "Task execution completed (no final message)";
403
}
404
}

Step 5: Create the Main Script

Typescript
main.ts
1
async function main(): Promise<void> {
2
console.log("🚀 Steel + Claude Computer Use Assistant");
3
console.log("=".repeat(60));
4
5
if (STEEL_API_KEY === "your-steel-api-key-here") {
6
console.warn(
7
"⚠️ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"
8
);
9
console.warn(
10
" Get your API key at: https://app.steel.dev/settings/api-keys"
11
);
12
return;
13
}
14
15
if (ANTHROPIC_API_KEY === "your-anthropic-api-key-here") {
16
console.warn(
17
"⚠️ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key"
18
);
19
console.warn(" Get your API key at: https://console.anthropic.com/");
20
return;
21
}
22
23
console.log("\nStarting Steel browser session...");
24
25
const computer = new SteelBrowser();
26
27
try {
28
await computer.initialize();
29
console.log("✅ Steel browser session started!");
30
31
const agent = new ClaudeAgent(computer, "claude-3-5-sonnet-20241022");
32
33
const startTime = Date.now();
34
35
try {
36
const result = await agent.executeTask(TASK, true, false, 50);
37
38
const duration = ((Date.now() - startTime) / 1000).toFixed(1);
39
40
console.log("\n" + "=".repeat(60));
41
console.log("🎉 TASK EXECUTION COMPLETED");
42
console.log("=".repeat(60));
43
console.log(`⏱️ Duration: ${duration} seconds`);
44
console.log(`🎯 Task: ${TASK}`);
45
console.log(`📋 Result:\n${result}`);
46
console.log("=".repeat(60));
47
} catch (error) {
48
console.error(`❌ Task execution failed: ${error}`);
49
process.exit(1);
50
}
51
} catch (error) {
52
console.log(`❌ Failed to start Steel browser: ${error}`);
53
console.log("Please check your STEEL_API_KEY and internet connection.");
54
process.exit(1);
55
} finally {
56
await computer.cleanup();
57
}
58
}
59
60
main().catch(console.error);

Running Your Agent

Execute your script:

You'll see the session URL printed in the console. Open this URL to view the live browser session.

The agent will execute the task defined in the TASK environment variable or the default task.

You can modify the task by setting the environment variable:

Terminal
export TASK="Research the latest developments in artificial intelligence"
npx ts-node main.ts

Customizing your agent's task

Try modifying the task to make your agent perform different actions:

ENV
.env
1
// Research specific topics
2
TASK = "Go to https://arxiv.org, search for 'machine learning', and summarize the latest papers.";
3
4
// E-commerce tasks
5
TASK = "Go to https://www.amazon.com, search for 'wireless headphones', and compare the top 3 results.";
6
7
// Information gathering
8
TASK = "Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary.";

Supported Models: This example uses Claude 3.5 Sonnet, but you can use any of the supported Claude models including Claude 3.7 Sonnet, Claude 4 Sonnet, or Claude 4 Opus. Update the model parameter in the ClaudeAgent constructor to switch models.

Next Steps

  • Explore the Steel API documentation for more advanced features

  • Check out the Anthropic documentation for more information about Claude's computer use capabilities

  • Add additional features like session recording or multi-session management