Quickstart (Typescript)

How to use Claude Computer Use with Steel

This guide shows you how to create AI agents with Claude's computer use capabilities and Steel's Computer API for autonomous web task execution.

Prerequisites

  • Node.js 20+

  • A Steel API key (sign up here)

  • An Anthropic API key with access to Claude models

Step 1: Setup and Helper Functions

First, create a project directory and install the required packages:

Terminal
# Create a project directory
mkdir steel-claude-computer-use
cd steel-claude-computer-use
# Initialize package.json
npm init -y
# Install required packages
npm install steel-sdk @anthropic-ai/sdk dotenv
npm install -D @types/node typescript ts-node

Create a .env file with your API keys:

ENV
.env
1
STEEL_API_KEY=your_steel_api_key_here
2
ANTHROPIC_API_KEY=your_anthropic_api_key_here
3
TASK=Go to Steel.dev and find the latest news

Create a file with helper functions, constants, and type definitions:

Typescript
helpers.ts
1
import * as dotenv from "dotenv";
2
import { Steel } from "steel-sdk";
3
import Anthropic from "@anthropic-ai/sdk";
4
import type {
5
MessageParam,
6
ToolResultBlockParam,
7
Message,
8
} from "@anthropic-ai/sdk/resources/messages";
9
10
dotenv.config();
11
12
export const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";
13
export const ANTHROPIC_API_KEY =
14
process.env.ANTHROPIC_API_KEY || "your-anthropic-api-key-here";
15
export const TASK = process.env.TASK || "Go to Steel.dev and find the latest news";
16
17
export function formatToday(): string {
18
return new Intl.DateTimeFormat("en-US", {
19
weekday: "long",
20
month: "long",
21
day: "2-digit",
22
year: "numeric",
23
}).format(new Date());
24
}
25
26
export const BROWSER_SYSTEM_PROMPT = `<BROWSER_ENV>
27
- You control a headful Chromium browser running in a VM with internet access.
28
- Chromium is already open; interact only through the "computer" tool (mouse, keyboard, scroll, screenshots).
29
- Today's date is ${formatToday()}.
30
</BROWSER_ENV>
31
32
<BROWSER_CONTROL>
33
- When viewing pages, zoom out or scroll so all relevant content is visible.
34
- When typing into any input:
35
* Clear it first with Ctrl+A, then Delete.
36
* After submitting (pressing Enter or clicking a button), take an extra screenshot to confirm the result and move the mouse away.
37
- Computer tool calls are slow; batch related actions into a single call whenever possible.
38
- You may act on the user's behalf on sites where they are already authenticated.
39
- Assume any required authentication/Auth Contexts are already configured before the task starts.
40
- If the first screenshot is black:
41
* Click near the center of the screen.
42
* Take another screenshot.
43
</BROWSER_CONTROL>
44
45
<TASK_EXECUTION>
46
- You receive exactly one natural-language task and no further user feedback.
47
- Do not ask the user clarifying questions; instead, make reasonable assumptions and proceed.
48
- For complex tasks, quickly plan a short, ordered sequence of steps before acting.
49
- Prefer minimal, high-signal actions that move directly toward the goal.
50
- Keep your final response concise and focused on fulfilling the task (e.g., a brief summary of findings or results).
51
</TASK_EXECUTION>`;
52
53
export type Coordinates = [number, number];
54
55
export interface BaseActionRequest {
56
screenshot?: boolean;
57
hold_keys?: string[];
58
}
59
60
export type MoveMouseRequest = BaseActionRequest & {
61
action: "move_mouse";
62
coordinates: Coordinates;
63
};
64
65
export type ClickMouseRequest = BaseActionRequest & {
66
action: "click_mouse";
67
button: "left" | "right" | "middle";
68
coordinates: Coordinates;
69
num_clicks?: number;
70
click_type?: "down" | "up";
71
};
72
73
export type DragMouseRequest = BaseActionRequest & {
74
action: "drag_mouse";
75
path: Coordinates[];
76
};
77
78
export type ScrollRequest = BaseActionRequest & {
79
action: "scroll";
80
coordinates: Coordinates;
81
delta_x: number;
82
delta_y: number;
83
};
84
85
export type PressKeyRequest = BaseActionRequest & {
86
action: "press_key";
87
keys: string[];
88
duration?: number;
89
};
90
91
export type TypeTextRequest = BaseActionRequest & {
92
action: "type_text";
93
text: string;
94
};
95
96
export type WaitRequest = BaseActionRequest & {
97
action: "wait";
98
duration: number;
99
};
100
101
export type GetCursorPositionRequest = {
102
action: "get_cursor_position";
103
};
104
105
export type ComputerActionRequest =
106
| MoveMouseRequest
107
| ClickMouseRequest
108
| DragMouseRequest
109
| ScrollRequest
110
| PressKeyRequest
111
| TypeTextRequest
112
| WaitRequest
113
| GetCursorPositionRequest;
114
115
export { Steel, Anthropic, MessageParam, ToolResultBlockParam, Message };

Step 2: Create the Agent Class

Typescript
agent.ts
1
import {
2
Steel,
3
Anthropic,
4
MessageParam,
5
ToolResultBlockParam,
6
Message,
7
STEEL_API_KEY,
8
ANTHROPIC_API_KEY,
9
BROWSER_SYSTEM_PROMPT,
10
Coordinates,
11
ComputerActionRequest,
12
} from "./helpers";
13
14
export class Agent {
15
private client: Anthropic;
16
private steel: Steel;
17
private session: Steel.Session | null = null;
18
private messages: MessageParam[];
19
private tools: any[];
20
private model: string;
21
private systemPrompt: string;
22
private viewportWidth: number;
23
private viewportHeight: number;
24
25
constructor() {
26
this.client = new Anthropic({ apiKey: ANTHROPIC_API_KEY });
27
this.steel = new Steel({ steelAPIKey: STEEL_API_KEY });
28
this.model = "claude-sonnet-4-5";
29
this.messages = [];
30
this.viewportWidth = 1280;
31
this.viewportHeight = 768;
32
this.systemPrompt = BROWSER_SYSTEM_PROMPT;
33
this.tools = [
34
{
35
type: "computer_20250124",
36
name: "computer",
37
display_width_px: this.viewportWidth,
38
display_height_px: this.viewportHeight,
39
display_number: 1,
40
},
41
];
42
}
43
44
private center(): [number, number] {
45
return [
46
Math.floor(this.viewportWidth / 2),
47
Math.floor(this.viewportHeight / 2),
48
];
49
}
50
51
private splitKeys(k?: string): string[] {
52
return k
53
? k
54
.split("+")
55
.map((s) => s.trim())
56
.filter(Boolean)
57
: [];
58
}
59
60
private normalizeKey(key: string): string {
61
if (!key) return key;
62
const k = String(key).trim();
63
const upper = k.toUpperCase();
64
const synonyms: Record<string, string> = {
65
ENTER: "Enter",
66
RETURN: "Enter",
67
ESC: "Escape",
68
ESCAPE: "Escape",
69
TAB: "Tab",
70
BACKSPACE: "Backspace",
71
DELETE: "Delete",
72
SPACE: "Space",
73
CTRL: "Control",
74
CONTROL: "Control",
75
ALT: "Alt",
76
SHIFT: "Shift",
77
META: "Meta",
78
CMD: "Meta",
79
UP: "ArrowUp",
80
DOWN: "ArrowDown",
81
LEFT: "ArrowLeft",
82
RIGHT: "ArrowRight",
83
HOME: "Home",
84
END: "End",
85
PAGEUP: "PageUp",
86
PAGEDOWN: "PageDown",
87
};
88
if (upper in synonyms) return synonyms[upper];
89
if (upper.startsWith("F") && /^\d+$/.test(upper.slice(1))) {
90
return "F" + upper.slice(1);
91
}
92
return k;
93
}
94
95
private normalizeKeys(keys: string[]): string[] {
96
return keys.map((k) => this.normalizeKey(k));
97
}
98
99
async initialize(): Promise<void> {
100
const width = this.viewportWidth;
101
const height = this.viewportHeight;
102
this.session = await this.steel.sessions.create({
103
dimensions: { width, height },
104
blockAds: true,
105
timeout: 900000,
106
});
107
console.log("Steel Session created successfully!");
108
console.log(`View live session at: ${this.session.sessionViewerUrl}`);
109
}
110
111
async cleanup(): Promise<void> {
112
if (this.session) {
113
console.log("Releasing Steel session...");
114
await this.steel.sessions.release(this.session.id);
115
console.log(
116
`Session completed. View replay at ${this.session.sessionViewerUrl}`
117
);
118
}
119
}
120
121
private async takeScreenshot(): Promise<string> {
122
const resp: any = await this.steel.sessions.computer(this.session!.id, {
123
action: "take_screenshot",
124
});
125
const img: string | undefined = resp?.base64_image;
126
if (!img) throw new Error("No screenshot returned from Input API");
127
return img;
128
}
129
130
async executeComputerAction(
131
action: string,
132
text?: string,
133
coordinate?: [number, number] | number[],
134
scrollDirection?: "up" | "down" | "left" | "right",
135
scrollAmount?: number,
136
duration?: number,
137
key?: string
138
): Promise<string> {
139
const coords: Coordinates =
140
coordinate && Array.isArray(coordinate) && coordinate.length === 2
141
? [coordinate[0], coordinate[1]]
142
: this.center();
143
144
let body: ComputerActionRequest | null = null;
145
146
switch (action) {
147
case "mouse_move": {
148
const hk = this.splitKeys(key);
149
body = {
150
action: "move_mouse",
151
coordinates: coords,
152
screenshot: true,
153
...(hk.length ? { hold_keys: hk } : {}),
154
};
155
break;
156
}
157
case "left_mouse_down":
158
case "left_mouse_up": {
159
const hk = this.splitKeys(key);
160
body = {
161
action: "click_mouse",
162
button: "left",
163
click_type: action === "left_mouse_down" ? "down" : "up",
164
coordinates: coords,
165
screenshot: true,
166
...(hk.length ? { hold_keys: hk } : {}),
167
};
168
break;
169
}
170
case "left_click":
171
case "right_click":
172
case "middle_click":
173
case "double_click":
174
case "triple_click": {
175
const buttonMap: Record<string, "left" | "right" | "middle"> = {
176
left_click: "left",
177
right_click: "right",
178
middle_click: "middle",
179
double_click: "left",
180
triple_click: "left",
181
};
182
const clicks =
183
action === "double_click" ? 2 : action === "triple_click" ? 3 : 1;
184
const hk = this.splitKeys(key);
185
body = {
186
action: "click_mouse",
187
button: buttonMap[action],
188
coordinates: coords,
189
screenshot: true,
190
...(clicks > 1 ? { num_clicks: clicks } : {}),
191
...(hk.length ? { hold_keys: hk } : {}),
192
};
193
break;
194
}
195
case "left_click_drag": {
196
const [endX, endY] = coords;
197
const [startX, startY] = this.center();
198
const hk = this.splitKeys(key);
199
body = {
200
action: "drag_mouse",
201
path: [
202
[startX, startY],
203
[endX, endY],
204
],
205
screenshot: true,
206
...(hk.length ? { hold_keys: hk } : {}),
207
};
208
break;
209
}
210
case "scroll": {
211
const step = 100;
212
type ScrollDir = "up" | "down" | "left" | "right";
213
const map: Record<ScrollDir, [number, number]> = {
214
down: [0, step * (scrollAmount as number)],
215
up: [0, -step * (scrollAmount as number)],
216
right: [step * (scrollAmount as number), 0],
217
left: [-(step * (scrollAmount as number)), 0],
218
};
219
const dir: ScrollDir = (scrollDirection || "down") as ScrollDir;
220
const [delta_x, delta_y] = map[dir];
221
const hk = this.splitKeys(text);
222
body = {
223
action: "scroll",
224
coordinates: coords,
225
delta_x,
226
delta_y,
227
screenshot: true,
228
...(hk.length ? { hold_keys: hk } : {}),
229
};
230
break;
231
}
232
case "hold_key": {
233
const keys = this.splitKeys(text);
234
const normalized = this.normalizeKeys(keys);
235
body = {
236
action: "press_key",
237
keys: normalized,
238
duration,
239
screenshot: true,
240
};
241
break;
242
}
243
case "key": {
244
const keys = this.splitKeys(text);
245
const normalized = this.normalizeKeys(keys);
246
body = {
247
action: "press_key",
248
keys: normalized,
249
screenshot: true,
250
};
251
break;
252
}
253
case "type": {
254
const hk = this.splitKeys(key);
255
body = {
256
action: "type_text",
257
text: text ?? "",
258
screenshot: true,
259
...(hk.length ? { hold_keys: hk } : {}),
260
};
261
break;
262
}
263
case "wait": {
264
body = {
265
action: "wait",
266
duration: duration ?? 1000,
267
screenshot: true,
268
};
269
break;
270
}
271
case "screenshot": {
272
return this.takeScreenshot();
273
}
274
case "cursor_position": {
275
await this.steel.sessions.computer(this.session!.id, {
276
action: "get_cursor_position",
277
});
278
return this.takeScreenshot();
279
}
280
default:
281
throw new Error(`Invalid action: ${action}`);
282
}
283
284
const resp: any = await this.steel.sessions.computer(
285
this.session!.id,
286
body!
287
);
288
const img: string | undefined = resp?.base64_image;
289
if (img) return img;
290
return this.takeScreenshot();
291
}
292
293
async processResponse(message: Message): Promise<string> {
294
let responseText = "";
295
296
for (const block of message.content) {
297
if (block.type === "text") {
298
responseText += block.text;
299
console.log(block.text);
300
} else if (block.type === "tool_use") {
301
const toolName = block.name;
302
const toolInput = block.input as any;
303
304
console.log(`๐Ÿ”ง ${toolName}(${JSON.stringify(toolInput)})`);
305
306
if (toolName === "computer") {
307
const action = toolInput.action;
308
const params = {
309
text: toolInput.text,
310
coordinate: toolInput.coordinate,
311
scrollDirection: toolInput.scroll_direction,
312
scrollAmount: toolInput.scroll_amount,
313
duration: toolInput.duration,
314
key: toolInput.key,
315
};
316
317
try {
318
const screenshotBase64 = await this.executeComputerAction(
319
action,
320
params.text,
321
params.coordinate,
322
params.scrollDirection,
323
params.scrollAmount,
324
params.duration,
325
params.key
326
);
327
328
const toolResult: ToolResultBlockParam = {
329
type: "tool_result",
330
tool_use_id: block.id,
331
content: [
332
{
333
type: "image",
334
source: {
335
type: "base64",
336
media_type: "image/png",
337
data: screenshotBase64,
338
},
339
},
340
],
341
};
342
343
this.messages.push({
344
role: "assistant",
345
content: [block],
346
});
347
this.messages.push({
348
role: "user",
349
content: [toolResult],
350
});
351
352
return this.getClaudeResponse();
353
} catch (error) {
354
console.log(`โŒ Error executing ${action}: ${error}`);
355
const toolResult: ToolResultBlockParam = {
356
type: "tool_result",
357
tool_use_id: block.id,
358
content: `Error executing ${action}: ${String(error)}`,
359
is_error: true,
360
};
361
362
this.messages.push({
363
role: "assistant",
364
content: [block],
365
});
366
this.messages.push({
367
role: "user",
368
content: [toolResult],
369
});
370
371
return this.getClaudeResponse();
372
}
373
}
374
}
375
}
376
377
if (
378
responseText &&
379
!message.content.some((block) => block.type === "tool_use")
380
) {
381
this.messages.push({
382
role: "assistant",
383
content: responseText,
384
});
385
}
386
387
return responseText;
388
}
389
390
async getClaudeResponse(): Promise<string> {
391
try {
392
const response = await this.client.beta.messages.create({
393
model: this.model,
394
max_tokens: 4096,
395
messages: this.messages,
396
tools: this.tools,
397
betas: ["computer-use-2025-01-24"],
398
});
399
400
return this.processResponse(response);
401
} catch (error) {
402
const errorMsg = `Error communicating with Claude: ${error}`;
403
console.log(`โŒ ${errorMsg}`);
404
return errorMsg;
405
}
406
}
407
408
async executeTask(
409
task: string,
410
printSteps: boolean = true,
411
debug: boolean = false,
412
maxIterations: number = 50
413
): Promise<string> {
414
this.messages = [
415
{
416
role: "user",
417
content: this.systemPrompt,
418
},
419
{
420
role: "user",
421
content: task,
422
},
423
];
424
425
let iterations = 0;
426
let consecutiveNoActions = 0;
427
let lastAssistantMessages: string[] = [];
428
429
console.log(`๐ŸŽฏ Executing task: ${task}`);
430
console.log("=".repeat(60));
431
432
const detectRepetition = (newMessage: string): boolean => {
433
if (lastAssistantMessages.length < 2) return false;
434
const similarity = (str1: string, str2: string): number => {
435
const words1 = str1.toLowerCase().split(/\s/);
436
const words2 = str2.toLowerCase().split(/\s+/);
437
const commonWords = words1.filter((word) => words2.includes(word));
438
return commonWords.length / Math.max(words1.length, words2.length);
439
};
440
return lastAssistantMessages.some(
441
(prevMessage) => similarity(newMessage, prevMessage) > 0.8
442
);
443
};
444
445
while (iterations < maxIterations) {
446
iterations++;
447
let hasActions = false;
448
449
if (this.messages.length > 0) {
450
const lastMessage = this.messages[this.messages.length - 1];
451
if (
452
lastMessage?.role === "assistant" &&
453
typeof lastMessage.content === "string"
454
) {
455
const content = lastMessage.content;
456
if (detectRepetition(content)) {
457
console.log("๐Ÿ”„ Repetition detected - stopping execution");
458
lastAssistantMessages.push(content);
459
break;
460
}
461
lastAssistantMessages.push(content);
462
if (lastAssistantMessages.length > 3) {
463
lastAssistantMessages.shift();
464
}
465
}
466
}
467
468
if (debug) {
469
console.log(JSON.stringify(this.messages, null, 2));
470
}
471
472
try {
473
const response = await this.client.beta.messages.create({
474
model: this.model,
475
max_tokens: 4096,
476
messages: this.messages,
477
tools: this.tools,
478
betas: ["computer-use-2025-01-24"],
479
});
480
481
if (debug) {
482
console.log(JSON.stringify(response, null, 2));
483
}
484
485
for (const block of response.content) {
486
if (block.type === "tool_use") {
487
hasActions = true;
488
}
489
}
490
491
await this.processResponse(response);
492
493
if (!hasActions) {
494
consecutiveNoActions++;
495
if (consecutiveNoActions >= 3) {
496
console.log(
497
"โš ๏ธ No actions for 3 consecutive iterations - stopping"
498
);
499
break;
500
}
501
} else {
502
consecutiveNoActions = 0;
503
}
504
} catch (error) {
505
console.error(`โŒ Error during task execution: ${error}`);
506
throw error;
507
}
508
}
509
510
if (iterations >= maxIterations) {
511
console.warn(
512
`โš ๏ธ Task execution stopped after ${maxIterations} iterations`
513
);
514
}
515
516
const assistantMessages = this.messages.filter(
517
(item) => item.role === "assistant"
518
);
519
const finalMessage = assistantMessages[assistantMessages.length - 1];
520
521
if (finalMessage && typeof finalMessage.content === "string") {
522
return finalMessage.content;
523
}
524
525
return "Task execution completed (no final message)";
526
}
527
}

Step 3: Create the Main Script

Typescript
main.ts
1
import { Agent } from "./agent";
2
import { STEEL_API_KEY, ANTHROPIC_API_KEY, TASK } from "./helpers";
3
4
async function main(): Promise<void> {
5
console.log("๐Ÿš€ Steel + Claude Computer Use Assistant");
6
console.log("=".repeat(60));
7
8
if (STEEL_API_KEY === "your-steel-api-key-here") {
9
console.warn(
10
"โš ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"
11
);
12
console.warn(
13
" Get your API key at: https://app.steel.dev/settings/api-keys"
14
);
15
throw new Error("Set STEEL_API_KEY");
16
}
17
18
if (ANTHROPIC_API_KEY === "your-anthropic-api-key-here") {
19
console.warn(
20
"โš ๏ธ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key"
21
);
22
console.warn(" Get your API key at: https://console.anthropic.com/");
23
throw new Error("Set ANTHROPIC_API_KEY");
24
}
25
26
console.log("\nStarting Steel session...");
27
const agent = new Agent();
28
29
try {
30
await agent.initialize();
31
console.log("โœ… Steel session started!");
32
33
const startTime = Date.now();
34
35
try {
36
const result = await agent.executeTask(TASK, true, false, 50);
37
const duration = ((Date.now() - startTime) / 1000).toFixed(1);
38
39
console.log("\n" + "=".repeat(60));
40
console.log("๐ŸŽ‰ TASK EXECUTION COMPLETED");
41
console.log("=".repeat(60));
42
console.log(`โฑ๏ธ Duration: ${duration} seconds`);
43
console.log(`๐ŸŽฏ Task: ${TASK}`);
44
console.log(`๐Ÿ“‹ Result:\n${result}`);
45
console.log("=".repeat(60));
46
} catch (error) {
47
console.error(`โŒ Task execution failed: ${error}`);
48
throw new Error("Task execution failed");
49
}
50
} catch (error) {
51
console.log(`โŒ Failed to start Steel session: ${error}`);
52
console.log("Please check your STEEL_API_KEY and internet connection.");
53
throw new Error("Failed to start Steel session");
54
} finally {
55
await agent.cleanup();
56
}
57
}
58
59
main()
60
.then(() => {
61
process.exit(0);
62
})
63
.catch((error) => {
64
console.error("Task execution failed:", error);
65
process.exit(1);
66
});

Running Your Agent

Execute your script:

Terminal
npx ts-node main.ts

You'll see the session URL printed in the console. Open this URL to view the live browser session.

The agent will execute the task defined in the TASK environment variable or the default task.

You can modify the task by setting the environment variable:

Terminal
export TASK="Research the latest developments in artificial intelligence"
npx ts-node main.ts

Customizing your agent's task

Try modifying the task to make your agent perform different actions:

ENV
.env
1
# Research specific topics
2
TASK=Go to https://arxiv.org, search for 'machine learning', and summarize the latest papers.
3
4
# E-commerce tasks
5
TASK=Go to https://www.amazon.com, search for 'wireless headphones', and compare the top 3 results.
6
7
# Information gathering
8
TASK=Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary.

Next Steps

  • Explore the Steel API documentation for more advanced features

  • Check out the Anthropic documentation for more information about Claude's computer use capabilities

  • Add additional features like session recording or multi-session management