Quickstart (Typescript)
How to use Claude Computer Use with Steel
This guide shows you how to create AI agents with Claude's computer use capabilities and Steel's Computer API for autonomous web task execution.
Prerequisites
-
Node.js 20+
-
A Steel API key (sign up here)
-
An Anthropic API key with access to Claude models
Step 1: Setup and Helper Functions
First, create a project directory and install the required packages:
# Create a project directorymkdir steel-claude-computer-usecd steel-claude-computer-use# Initialize package.jsonnpm init -y# Install required packagesnpm install steel-sdk @anthropic-ai/sdk dotenvnpm install -D @types/node typescript ts-node
Create a .env file with your API keys:
1STEEL_API_KEY=your_steel_api_key_here2ANTHROPIC_API_KEY=your_anthropic_api_key_here3TASK=Go to Steel.dev and find the latest news
Create a file with helper functions, constants, and type definitions:
1import * as dotenv from "dotenv";2import { Steel } from "steel-sdk";3import Anthropic from "@anthropic-ai/sdk";4import type {5MessageParam,6ToolResultBlockParam,7Message,8} from "@anthropic-ai/sdk/resources/messages";910dotenv.config();1112export const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";13export const ANTHROPIC_API_KEY =14process.env.ANTHROPIC_API_KEY || "your-anthropic-api-key-here";15export const TASK = process.env.TASK || "Go to Steel.dev and find the latest news";1617export function formatToday(): string {18return new Intl.DateTimeFormat("en-US", {19weekday: "long",20month: "long",21day: "2-digit",22year: "numeric",23}).format(new Date());24}2526export const BROWSER_SYSTEM_PROMPT = `<BROWSER_ENV>27- You control a headful Chromium browser running in a VM with internet access.28- Chromium is already open; interact only through the "computer" tool (mouse, keyboard, scroll, screenshots).29- Today's date is ${formatToday()}.30</BROWSER_ENV>3132<BROWSER_CONTROL>33- When viewing pages, zoom out or scroll so all relevant content is visible.34- When typing into any input:35* Clear it first with Ctrl+A, then Delete.36* After submitting (pressing Enter or clicking a button), take an extra screenshot to confirm the result and move the mouse away.37- Computer tool calls are slow; batch related actions into a single call whenever possible.38- You may act on the user's behalf on sites where they are already authenticated.39- Assume any required authentication/Auth Contexts are already configured before the task starts.40- If the first screenshot is black:41* Click near the center of the screen.42* Take another screenshot.43</BROWSER_CONTROL>4445<TASK_EXECUTION>46- You receive exactly one natural-language task and no further user feedback.47- Do not ask the user clarifying questions; instead, make reasonable assumptions and proceed.48- For complex tasks, quickly plan a short, ordered sequence of steps before acting.49- Prefer minimal, high-signal actions that move directly toward the goal.50- Keep your final response concise and focused on fulfilling the task (e.g., a brief summary of findings or results).51</TASK_EXECUTION>`;5253export type Coordinates = [number, number];5455export interface BaseActionRequest {56screenshot?: boolean;57hold_keys?: string[];58}5960export type MoveMouseRequest = BaseActionRequest & {61action: "move_mouse";62coordinates: Coordinates;63};6465export type ClickMouseRequest = BaseActionRequest & {66action: "click_mouse";67button: "left" | "right" | "middle";68coordinates: Coordinates;69num_clicks?: number;70click_type?: "down" | "up";71};7273export type DragMouseRequest = BaseActionRequest & {74action: "drag_mouse";75path: Coordinates[];76};7778export type ScrollRequest = BaseActionRequest & {79action: "scroll";80coordinates: Coordinates;81delta_x: number;82delta_y: number;83};8485export type PressKeyRequest = BaseActionRequest & {86action: "press_key";87keys: string[];88duration?: number;89};9091export type TypeTextRequest = BaseActionRequest & {92action: "type_text";93text: string;94};9596export type WaitRequest = BaseActionRequest & {97action: "wait";98duration: number;99};100101export type GetCursorPositionRequest = {102action: "get_cursor_position";103};104105export type ComputerActionRequest =106| MoveMouseRequest107| ClickMouseRequest108| DragMouseRequest109| ScrollRequest110| PressKeyRequest111| TypeTextRequest112| WaitRequest113| GetCursorPositionRequest;114115export { Steel, Anthropic, MessageParam, ToolResultBlockParam, Message };
Step 2: Create the Agent Class
1import {2Steel,3Anthropic,4MessageParam,5ToolResultBlockParam,6Message,7STEEL_API_KEY,8ANTHROPIC_API_KEY,9BROWSER_SYSTEM_PROMPT,10Coordinates,11ComputerActionRequest,12} from "./helpers";1314export class Agent {15private client: Anthropic;16private steel: Steel;17private session: Steel.Session | null = null;18private messages: MessageParam[];19private tools: any[];20private model: string;21private systemPrompt: string;22private viewportWidth: number;23private viewportHeight: number;2425constructor() {26this.client = new Anthropic({ apiKey: ANTHROPIC_API_KEY });27this.steel = new Steel({ steelAPIKey: STEEL_API_KEY });28this.model = "claude-sonnet-4-5";29this.messages = [];30this.viewportWidth = 1280;31this.viewportHeight = 768;32this.systemPrompt = BROWSER_SYSTEM_PROMPT;33this.tools = [34{35type: "computer_20250124",36name: "computer",37display_width_px: this.viewportWidth,38display_height_px: this.viewportHeight,39display_number: 1,40},41];42}4344private center(): [number, number] {45return [46Math.floor(this.viewportWidth / 2),47Math.floor(this.viewportHeight / 2),48];49}5051private splitKeys(k?: string): string[] {52return k53? k54.split("+")55.map((s) => s.trim())56.filter(Boolean)57: [];58}5960private normalizeKey(key: string): string {61if (!key) return key;62const k = String(key).trim();63const upper = k.toUpperCase();64const synonyms: Record<string, string> = {65ENTER: "Enter",66RETURN: "Enter",67ESC: "Escape",68ESCAPE: "Escape",69TAB: "Tab",70BACKSPACE: "Backspace",71DELETE: "Delete",72SPACE: "Space",73CTRL: "Control",74CONTROL: "Control",75ALT: "Alt",76SHIFT: "Shift",77META: "Meta",78CMD: "Meta",79UP: "ArrowUp",80DOWN: "ArrowDown",81LEFT: "ArrowLeft",82RIGHT: "ArrowRight",83HOME: "Home",84END: "End",85PAGEUP: "PageUp",86PAGEDOWN: "PageDown",87};88if (upper in synonyms) return synonyms[upper];89if (upper.startsWith("F") && /^\d+$/.test(upper.slice(1))) {90return "F" + upper.slice(1);91}92return k;93}9495private normalizeKeys(keys: string[]): string[] {96return keys.map((k) => this.normalizeKey(k));97}9899async initialize(): Promise<void> {100const width = this.viewportWidth;101const height = this.viewportHeight;102this.session = await this.steel.sessions.create({103dimensions: { width, height },104blockAds: true,105timeout: 900000,106});107console.log("Steel Session created successfully!");108console.log(`View live session at: ${this.session.sessionViewerUrl}`);109}110111async cleanup(): Promise<void> {112if (this.session) {113console.log("Releasing Steel session...");114await this.steel.sessions.release(this.session.id);115console.log(116`Session completed. View replay at ${this.session.sessionViewerUrl}`117);118}119}120121private async takeScreenshot(): Promise<string> {122const resp: any = await this.steel.sessions.computer(this.session!.id, {123action: "take_screenshot",124});125const img: string | undefined = resp?.base64_image;126if (!img) throw new Error("No screenshot returned from Input API");127return img;128}129130async executeComputerAction(131action: string,132text?: string,133coordinate?: [number, number] | number[],134scrollDirection?: "up" | "down" | "left" | "right",135scrollAmount?: number,136duration?: number,137key?: string138): Promise<string> {139const coords: Coordinates =140coordinate && Array.isArray(coordinate) && coordinate.length === 2141? [coordinate[0], coordinate[1]]142: this.center();143144let body: ComputerActionRequest | null = null;145146switch (action) {147case "mouse_move": {148const hk = this.splitKeys(key);149body = {150action: "move_mouse",151coordinates: coords,152screenshot: true,153...(hk.length ? { hold_keys: hk } : {}),154};155break;156}157case "left_mouse_down":158case "left_mouse_up": {159const hk = this.splitKeys(key);160body = {161action: "click_mouse",162button: "left",163click_type: action === "left_mouse_down" ? "down" : "up",164coordinates: coords,165screenshot: true,166...(hk.length ? { hold_keys: hk } : {}),167};168break;169}170case "left_click":171case "right_click":172case "middle_click":173case "double_click":174case "triple_click": {175const buttonMap: Record<string, "left" | "right" | "middle"> = {176left_click: "left",177right_click: "right",178middle_click: "middle",179double_click: "left",180triple_click: "left",181};182const clicks =183action === "double_click" ? 2 : action === "triple_click" ? 3 : 1;184const hk = this.splitKeys(key);185body = {186action: "click_mouse",187button: buttonMap[action],188coordinates: coords,189screenshot: true,190...(clicks > 1 ? { num_clicks: clicks } : {}),191...(hk.length ? { hold_keys: hk } : {}),192};193break;194}195case "left_click_drag": {196const [endX, endY] = coords;197const [startX, startY] = this.center();198const hk = this.splitKeys(key);199body = {200action: "drag_mouse",201path: [202[startX, startY],203[endX, endY],204],205screenshot: true,206...(hk.length ? { hold_keys: hk } : {}),207};208break;209}210case "scroll": {211const step = 100;212type ScrollDir = "up" | "down" | "left" | "right";213const map: Record<ScrollDir, [number, number]> = {214down: [0, step * (scrollAmount as number)],215up: [0, -step * (scrollAmount as number)],216right: [step * (scrollAmount as number), 0],217left: [-(step * (scrollAmount as number)), 0],218};219const dir: ScrollDir = (scrollDirection || "down") as ScrollDir;220const [delta_x, delta_y] = map[dir];221const hk = this.splitKeys(text);222body = {223action: "scroll",224coordinates: coords,225delta_x,226delta_y,227screenshot: true,228...(hk.length ? { hold_keys: hk } : {}),229};230break;231}232case "hold_key": {233const keys = this.splitKeys(text);234const normalized = this.normalizeKeys(keys);235body = {236action: "press_key",237keys: normalized,238duration,239screenshot: true,240};241break;242}243case "key": {244const keys = this.splitKeys(text);245const normalized = this.normalizeKeys(keys);246body = {247action: "press_key",248keys: normalized,249screenshot: true,250};251break;252}253case "type": {254const hk = this.splitKeys(key);255body = {256action: "type_text",257text: text ?? "",258screenshot: true,259...(hk.length ? { hold_keys: hk } : {}),260};261break;262}263case "wait": {264body = {265action: "wait",266duration: duration ?? 1000,267screenshot: true,268};269break;270}271case "screenshot": {272return this.takeScreenshot();273}274case "cursor_position": {275await this.steel.sessions.computer(this.session!.id, {276action: "get_cursor_position",277});278return this.takeScreenshot();279}280default:281throw new Error(`Invalid action: ${action}`);282}283284const resp: any = await this.steel.sessions.computer(285this.session!.id,286body!287);288const img: string | undefined = resp?.base64_image;289if (img) return img;290return this.takeScreenshot();291}292293async processResponse(message: Message): Promise<string> {294let responseText = "";295296for (const block of message.content) {297if (block.type === "text") {298responseText += block.text;299console.log(block.text);300} else if (block.type === "tool_use") {301const toolName = block.name;302const toolInput = block.input as any;303304console.log(`๐ง ${toolName}(${JSON.stringify(toolInput)})`);305306if (toolName === "computer") {307const action = toolInput.action;308const params = {309text: toolInput.text,310coordinate: toolInput.coordinate,311scrollDirection: toolInput.scroll_direction,312scrollAmount: toolInput.scroll_amount,313duration: toolInput.duration,314key: toolInput.key,315};316317try {318const screenshotBase64 = await this.executeComputerAction(319action,320params.text,321params.coordinate,322params.scrollDirection,323params.scrollAmount,324params.duration,325params.key326);327328const toolResult: ToolResultBlockParam = {329type: "tool_result",330tool_use_id: block.id,331content: [332{333type: "image",334source: {335type: "base64",336media_type: "image/png",337data: screenshotBase64,338},339},340],341};342343this.messages.push({344role: "assistant",345content: [block],346});347this.messages.push({348role: "user",349content: [toolResult],350});351352return this.getClaudeResponse();353} catch (error) {354console.log(`โ Error executing ${action}: ${error}`);355const toolResult: ToolResultBlockParam = {356type: "tool_result",357tool_use_id: block.id,358content: `Error executing ${action}: ${String(error)}`,359is_error: true,360};361362this.messages.push({363role: "assistant",364content: [block],365});366this.messages.push({367role: "user",368content: [toolResult],369});370371return this.getClaudeResponse();372}373}374}375}376377if (378responseText &&379!message.content.some((block) => block.type === "tool_use")380) {381this.messages.push({382role: "assistant",383content: responseText,384});385}386387return responseText;388}389390async getClaudeResponse(): Promise<string> {391try {392const response = await this.client.beta.messages.create({393model: this.model,394max_tokens: 4096,395messages: this.messages,396tools: this.tools,397betas: ["computer-use-2025-01-24"],398});399400return this.processResponse(response);401} catch (error) {402const errorMsg = `Error communicating with Claude: ${error}`;403console.log(`โ ${errorMsg}`);404return errorMsg;405}406}407408async executeTask(409task: string,410printSteps: boolean = true,411debug: boolean = false,412maxIterations: number = 50413): Promise<string> {414this.messages = [415{416role: "user",417content: this.systemPrompt,418},419{420role: "user",421content: task,422},423];424425let iterations = 0;426let consecutiveNoActions = 0;427let lastAssistantMessages: string[] = [];428429console.log(`๐ฏ Executing task: ${task}`);430console.log("=".repeat(60));431432const detectRepetition = (newMessage: string): boolean => {433if (lastAssistantMessages.length < 2) return false;434const similarity = (str1: string, str2: string): number => {435const words1 = str1.toLowerCase().split(/\s/);436const words2 = str2.toLowerCase().split(/\s+/);437const commonWords = words1.filter((word) => words2.includes(word));438return commonWords.length / Math.max(words1.length, words2.length);439};440return lastAssistantMessages.some(441(prevMessage) => similarity(newMessage, prevMessage) > 0.8442);443};444445while (iterations < maxIterations) {446iterations++;447let hasActions = false;448449if (this.messages.length > 0) {450const lastMessage = this.messages[this.messages.length - 1];451if (452lastMessage?.role === "assistant" &&453typeof lastMessage.content === "string"454) {455const content = lastMessage.content;456if (detectRepetition(content)) {457console.log("๐ Repetition detected - stopping execution");458lastAssistantMessages.push(content);459break;460}461lastAssistantMessages.push(content);462if (lastAssistantMessages.length > 3) {463lastAssistantMessages.shift();464}465}466}467468if (debug) {469console.log(JSON.stringify(this.messages, null, 2));470}471472try {473const response = await this.client.beta.messages.create({474model: this.model,475max_tokens: 4096,476messages: this.messages,477tools: this.tools,478betas: ["computer-use-2025-01-24"],479});480481if (debug) {482console.log(JSON.stringify(response, null, 2));483}484485for (const block of response.content) {486if (block.type === "tool_use") {487hasActions = true;488}489}490491await this.processResponse(response);492493if (!hasActions) {494consecutiveNoActions++;495if (consecutiveNoActions >= 3) {496console.log(497"โ ๏ธ No actions for 3 consecutive iterations - stopping"498);499break;500}501} else {502consecutiveNoActions = 0;503}504} catch (error) {505console.error(`โ Error during task execution: ${error}`);506throw error;507}508}509510if (iterations >= maxIterations) {511console.warn(512`โ ๏ธ Task execution stopped after ${maxIterations} iterations`513);514}515516const assistantMessages = this.messages.filter(517(item) => item.role === "assistant"518);519const finalMessage = assistantMessages[assistantMessages.length - 1];520521if (finalMessage && typeof finalMessage.content === "string") {522return finalMessage.content;523}524525return "Task execution completed (no final message)";526}527}
Step 3: Create the Main Script
1import { Agent } from "./agent";2import { STEEL_API_KEY, ANTHROPIC_API_KEY, TASK } from "./helpers";34async function main(): Promise<void> {5console.log("๐ Steel + Claude Computer Use Assistant");6console.log("=".repeat(60));78if (STEEL_API_KEY === "your-steel-api-key-here") {9console.warn(10"โ ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"11);12console.warn(13" Get your API key at: https://app.steel.dev/settings/api-keys"14);15throw new Error("Set STEEL_API_KEY");16}1718if (ANTHROPIC_API_KEY === "your-anthropic-api-key-here") {19console.warn(20"โ ๏ธ WARNING: Please replace 'your-anthropic-api-key-here' with your actual Anthropic API key"21);22console.warn(" Get your API key at: https://console.anthropic.com/");23throw new Error("Set ANTHROPIC_API_KEY");24}2526console.log("\nStarting Steel session...");27const agent = new Agent();2829try {30await agent.initialize();31console.log("โ Steel session started!");3233const startTime = Date.now();3435try {36const result = await agent.executeTask(TASK, true, false, 50);37const duration = ((Date.now() - startTime) / 1000).toFixed(1);3839console.log("\n" + "=".repeat(60));40console.log("๐ TASK EXECUTION COMPLETED");41console.log("=".repeat(60));42console.log(`โฑ๏ธ Duration: ${duration} seconds`);43console.log(`๐ฏ Task: ${TASK}`);44console.log(`๐ Result:\n${result}`);45console.log("=".repeat(60));46} catch (error) {47console.error(`โ Task execution failed: ${error}`);48throw new Error("Task execution failed");49}50} catch (error) {51console.log(`โ Failed to start Steel session: ${error}`);52console.log("Please check your STEEL_API_KEY and internet connection.");53throw new Error("Failed to start Steel session");54} finally {55await agent.cleanup();56}57}5859main()60.then(() => {61process.exit(0);62})63.catch((error) => {64console.error("Task execution failed:", error);65process.exit(1);66});
Running Your Agent
Execute your script:
npx ts-node main.ts
You'll see the session URL printed in the console. Open this URL to view the live browser session.
The agent will execute the task defined in the TASK environment variable or the default task.
You can modify the task by setting the environment variable:
export TASK="Research the latest developments in artificial intelligence"npx ts-node main.ts
Customizing your agent's task
Try modifying the task to make your agent perform different actions:
1# Research specific topics2TASK=Go to https://arxiv.org, search for 'machine learning', and summarize the latest papers.34# E-commerce tasks5TASK=Go to https://www.amazon.com, search for 'wireless headphones', and compare the top 3 results.67# Information gathering8TASK=Go to https://docs.anthropic.com, find information about Claude's capabilities, and provide a summary.
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the Anthropic documentation for more information about Claude's computer use capabilities
-
Add additional features like session recording or multi-session management