Quickstart (Typescript)
How to use OpenAI Computer Use with Steel
This guide will walk you through how to use OpenAI's computer-use-preview model with Steel's Computer API to create AI agents that can navigate the web.
We'll be implementing a simple CUA loop that functions as described below:

Prerequisites
-
Node.js 20+
-
A Steel API key (sign up here)
-
An OpenAI API key with access to the
computer-use-previewmodel
Step 1: Setup and Helper Functions
First, create a project directory and install the required packages:
# Create a project directorymkdir steel-openai-computer-usecd steel-openai-computer-use# Initialize package.jsonnpm init -y# Install required packagesnpm install steel-sdk dotenvnpm install -D @types/node typescript ts-node
Create a .env file with your API keys:
1STEEL_API_KEY=your_steel_api_key_here2OPENAI_API_KEY=your_openai_api_key_here3TASK=Go to Steel.dev and find the latest news
Create a file with helper functions, constants, and type definitions:
1import * as dotenv from "dotenv";2import { Steel } from "steel-sdk";34dotenv.config();56export const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";7export const OPENAI_API_KEY = process.env.OPENAI_API_KEY || "your-openai-api-key-here";8export const TASK = process.env.TASK || "Go to Steel.dev and find the latest news";910export function formatToday(): string {11return new Intl.DateTimeFormat("en-US", {12weekday: "long",13month: "long",14day: "2-digit",15year: "numeric",16}).format(new Date());17}1819export const BROWSER_SYSTEM_PROMPT = `<BROWSER_ENV>20- You control a headful Chromium browser running in a VM with internet access.21- Interact only through the computer tool (mouse/keyboard/scroll/screenshots). Do not call navigation functions.22- Today's date is ${formatToday()}.23</BROWSER_ENV>2425<BROWSER_CONTROL>26- Before acting, take a screenshot to observe state.27- When typing into any input:28* Clear with Ctrl/⌘+A, then Delete.29* After submitting (Enter or clicking a button), take another screenshot and move the mouse aside.30- Computer calls are slow; batch related actions together.31- Zoom out or scroll so all relevant content is visible before reading.32- If the first screenshot is black, click near center and screenshot again.33</BROWSER_CONTROL>3435<TASK_EXECUTION>36- You receive exactly one natural-language task and no further user feedback.37- Do not ask clarifying questions; make reasonable assumptions and proceed.38- Prefer minimal, high-signal actions that move directly toward the goal.39- Keep the final response concise and focused on fulfilling the task.40</TASK_EXECUTION>`;4142export interface MessageItem {43type: "message";44content: Array<{ text: string }>;45}4647export interface FunctionCallItem {48type: "function_call";49call_id: string;50name: string;51arguments: string;52}5354export interface ComputerCallItem {55type: "computer_call";56call_id: string;57action: {58type: string;59[key: string]: any;60};61pending_safety_checks?: Array<{62id: string;63message: string;64}>;65}6667export interface OutputItem {68type: "computer_call_output" | "function_call_output";69call_id: string;70acknowledged_safety_checks?: Array<{71id: string;72message: string;73}>;74output?:75| {76type: string;77image_url?: string;78}79| string;80}8182export interface ResponseItem {83id: string;84output: (MessageItem | FunctionCallItem | ComputerCallItem)[];85}8687export type Coordinates = [number, number];8889export interface BaseActionRequest {90screenshot?: boolean;91hold_keys?: string[];92}9394export type ComputerActionRequest =95| (BaseActionRequest & { action: "move_mouse"; coordinates: Coordinates })96| (BaseActionRequest & {97action: "click_mouse";98button: "left" | "right" | "middle" | "back" | "forward";99coordinates?: Coordinates;100num_clicks?: number;101click_type?: "down" | "up" | "click";102})103| (BaseActionRequest & { action: "drag_mouse"; path: Coordinates[] })104| (BaseActionRequest & {105action: "scroll";106coordinates?: Coordinates;107delta_x?: number;108delta_y?: number;109})110| (BaseActionRequest & { action: "press_key"; keys: string[]; duration?: number })111| (BaseActionRequest & { action: "type_text"; text: string })112| (BaseActionRequest & { action: "wait"; duration: number })113| { action: "take_screenshot" }114| { action: "get_cursor_position" };115116export async function createResponse(params: any): Promise<ResponseItem> {117const url = "https://api.openai.com/v1/responses";118const headers: Record<string, string> = {119Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,120"Content-Type": "application/json",121};122123const openaiOrg = process.env.OPENAI_ORG;124if (openaiOrg) {125headers["Openai-Organization"] = openaiOrg;126}127128const response = await fetch(url, {129method: "POST",130headers,131body: JSON.stringify(params),132});133134if (!response.ok) {135const errorText = await response.text();136throw new Error(`OpenAI API Error: ${response.status} ${errorText}`);137}138139return (await response.json()) as ResponseItem;140}141142export { Steel };
Step 2: Create the Agent Class
1import {2Steel,3STEEL_API_KEY,4BROWSER_SYSTEM_PROMPT,5Coordinates,6ComputerActionRequest,7MessageItem,8FunctionCallItem,9ComputerCallItem,10OutputItem,11createResponse,12} from "./helpers";1314export class Agent {15private steel: Steel;16private session: any | null = null;17private model: string;18private tools: any[];19private viewportWidth: number;20private viewportHeight: number;21private systemPrompt: string;22private printSteps: boolean = true;23private autoAcknowledgeSafety: boolean = true;2425constructor() {26this.steel = new Steel({ steelAPIKey: STEEL_API_KEY });27this.model = "computer-use-preview";28this.viewportWidth = 1280;29this.viewportHeight = 768;30this.systemPrompt = BROWSER_SYSTEM_PROMPT;31this.tools = [32{33type: "computer-preview",34display_width: this.viewportWidth,35display_height: this.viewportHeight,36environment: "browser",37},38];39}4041private center(): [number, number] {42return [43Math.floor(this.viewportWidth / 2),44Math.floor(this.viewportHeight / 2),45];46}4748private toNumber(v: any, def = 0): number {49if (typeof v === "number") return v;50if (typeof v === "string") {51const n = parseFloat(v);52return Number.isFinite(n) ? n : def;53}54return def;55}5657private toCoords(x?: any, y?: any): Coordinates {58const xx = this.toNumber(x, this.center()[0]);59const yy = this.toNumber(y, this.center()[1]);60return [xx, yy];61}6263private splitKeys(k?: string | string[]): string[] {64if (Array.isArray(k)) return k.filter(Boolean) as string[];65if (!k) return [];66return k67.split("+")68.map((s) => s.trim())69.filter(Boolean);70}7172private mapButton(btn?: string): "left" | "right" | "middle" | "back" | "forward" {73const b = (btn || "left").toLowerCase();74if (b === "right" || b === "middle" || b === "back" || b === "forward") return b;75return "left";76}7778private normalizeKey(key: string): string {79if (!key) return key;80const k = String(key).trim();81const upper = k.toUpperCase();82const synonyms: Record<string, string> = {83ENTER: "Enter",84RETURN: "Enter",85ESC: "Escape",86ESCAPE: "Escape",87TAB: "Tab",88BACKSPACE: "Backspace",89DELETE: "Delete",90SPACE: "Space",91CTRL: "Control",92CONTROL: "Control",93ALT: "Alt",94SHIFT: "Shift",95META: "Meta",96CMD: "Meta",97UP: "ArrowUp",98DOWN: "ArrowDown",99LEFT: "ArrowLeft",100RIGHT: "ArrowRight",101HOME: "Home",102END: "End",103PAGEUP: "PageUp",104PAGEDOWN: "PageDown",105};106if (upper in synonyms) return synonyms[upper];107if (upper.startsWith("F") && /^\d+$/.test(upper.slice(1))) {108return "F" + upper.slice(1);109}110return k;111}112113private normalizeKeys(keys: string[]): string[] {114return keys.map((k) => this.normalizeKey(k));115}116117async initialize(): Promise<void> {118const width = this.viewportWidth;119const height = this.viewportHeight;120this.session = await this.steel.sessions.create({121dimensions: { width, height },122blockAds: true,123timeout: 900000,124});125console.log("Steel Session created successfully!");126console.log(`View live session at: ${this.session.sessionViewerUrl}`);127}128129async cleanup(): Promise<void> {130if (this.session) {131console.log("Releasing Steel session...");132await this.steel.sessions.release(this.session.id);133console.log(134`Session completed. View replay at ${this.session.sessionViewerUrl}`135);136this.session = null;137}138}139140private async takeScreenshot(): Promise<string> {141const resp: any = await this.steel.sessions.computer(this.session!.id, {142action: "take_screenshot",143});144const img: string | undefined = resp?.base64_image;145if (!img) throw new Error("No screenshot returned from Steel");146return img;147}148149private async executeComputerAction(150actionType: string,151actionArgs: any152): Promise<string> {153let body: ComputerActionRequest | null = null;154155switch (actionType) {156case "move": {157const coords = this.toCoords(actionArgs.x, actionArgs.y);158body = {159action: "move_mouse",160coordinates: coords,161screenshot: true,162};163break;164}165case "click": {166const coords = this.toCoords(actionArgs.x, actionArgs.y);167const button = this.mapButton(actionArgs.button);168const clicks = this.toNumber(actionArgs.num_clicks, 1);169body = {170action: "click_mouse",171button,172coordinates: coords,173...(clicks > 1 ? { num_clicks: clicks } : {}),174screenshot: true,175};176break;177}178case "doubleClick":179case "double_click": {180const coords = this.toCoords(actionArgs.x, actionArgs.y);181body = {182action: "click_mouse",183button: "left",184coordinates: coords,185num_clicks: 2,186screenshot: true,187};188break;189}190case "drag": {191const path = Array.isArray(actionArgs.path) ? actionArgs.path : [];192const steelPath: Coordinates[] = path.map((p: any) =>193this.toCoords(p.x, p.y)194);195if (steelPath.length < 2) {196const [cx, cy] = this.center();197steelPath.unshift([cx, cy]);198}199body = {200action: "drag_mouse",201path: steelPath,202screenshot: true,203};204break;205}206case "scroll": {207const coords =208actionArgs.x != null || actionArgs.y != null209? this.toCoords(actionArgs.x, actionArgs.y)210: undefined;211const delta_x = this.toNumber(actionArgs.scroll_x, 0);212const delta_y = this.toNumber(actionArgs.scroll_y, 0);213body = {214action: "scroll",215...(coords ? { coordinates: coords } : {}),216...(delta_x !== 0 ? { delta_x } : {}),217...(delta_y !== 0 ? { delta_y } : {}),218screenshot: true,219};220break;221}222case "type": {223const text = typeof actionArgs.text === "string" ? actionArgs.text : "";224body = {225action: "type_text",226text,227screenshot: true,228};229break;230}231case "keypress": {232const keys = Array.isArray(actionArgs.keys)233? actionArgs.keys234: this.splitKeys(actionArgs.keys);235const normalized = this.normalizeKeys(keys);236body = {237action: "press_key",238keys: normalized,239screenshot: true,240};241break;242}243case "wait": {244const ms = this.toNumber(actionArgs.ms, 1000);245const seconds = Math.max(0.001, ms / 1000);246body = {247action: "wait",248duration: seconds,249screenshot: true,250};251break;252}253case "screenshot": {254return this.takeScreenshot();255}256default: {257return this.takeScreenshot();258}259}260261const resp: any = await this.steel.sessions.computer(262this.session!.id,263body!264);265const img: string | undefined = resp?.base64_image;266if (img) return img;267return this.takeScreenshot();268}269270private async handleItem(271item: MessageItem | FunctionCallItem | ComputerCallItem272): Promise<OutputItem[]> {273if (item.type === "message") {274if (this.printSteps) {275console.log(item.content[0].text);276}277return [];278}279280if (item.type === "function_call") {281if (this.printSteps) {282console.log(`${item.name}(${item.arguments})`);283}284return [285{286type: "function_call_output",287call_id: item.call_id,288output: "success",289},290];291}292293if (item.type === "computer_call") {294const { action } = item;295const actionType = action.type;296const { type, ...actionArgs } = action;297298if (this.printSteps) {299console.log(`${actionType}(${JSON.stringify(actionArgs)})`);300}301302const screenshotBase64 = await this.executeComputerAction(303actionType,304actionArgs305);306307const pendingChecks = item.pending_safety_checks || [];308for (const check of pendingChecks) {309if (this.autoAcknowledgeSafety) {310console.log(`⚠️ Auto-acknowledging safety check: ${check.message}`);311} else {312throw new Error(`Safety check failed: ${check.message}`);313}314}315316const callOutput: OutputItem = {317type: "computer_call_output",318call_id: item.call_id,319acknowledged_safety_checks: pendingChecks,320output: {321type: "input_image",322image_url: `data:image/png;base64,${screenshotBase64}`,323},324};325326return [callOutput];327}328329return [];330}331332async executeTask(333task: string,334printSteps: boolean = true,335debug: boolean = false,336maxIterations: number = 50337): Promise<string> {338this.printSteps = printSteps;339340const inputItems = [341{342role: "system",343content: this.systemPrompt,344},345{346role: "user",347content: task,348},349];350351let newItems: any[] = [];352let iterations = 0;353let consecutiveNoActions = 0;354let lastAssistantTexts: string[] = [];355356console.log(`🎯 Executing task: ${task}`);357console.log("=".repeat(60));358359const detectRepetition = (text: string): boolean => {360if (lastAssistantTexts.length < 2) return false;361const words1 = text.toLowerCase().split(/\s+/);362return lastAssistantTexts.some((prev) => {363const words2 = prev.toLowerCase().split(/\s+/);364const common = words1.filter((w) => words2.includes(w));365return common.length / Math.max(words1.length, words2.length) > 0.8;366});367};368369while (iterations < maxIterations) {370iterations++;371let hasActions = false;372373if (374newItems.length > 0 &&375newItems[newItems.length - 1]?.role === "assistant"376) {377const last = newItems[newItems.length - 1];378const content = last.content?.[0]?.text;379if (content) {380if (detectRepetition(content)) {381console.log("🔄 Repetition detected - stopping execution");382lastAssistantTexts.push(content);383break;384}385lastAssistantTexts.push(content);386if (lastAssistantTexts.length > 3) lastAssistantTexts.shift();387}388}389390try {391const response = await createResponse({392model: this.model,393input: [...inputItems, ...newItems],394tools: this.tools,395truncation: "auto",396});397398if (!response.output) {399throw new Error("No output from model");400}401402newItems.push(...response.output);403404for (const item of response.output) {405if (item.type === "computer_call" || item.type === "function_call") {406hasActions = true;407}408const handleResult = await this.handleItem(item);409newItems.push(...handleResult);410}411412if (!hasActions) {413consecutiveNoActions++;414if (consecutiveNoActions >= 3) {415console.log(416"⚠️ No actions for 3 consecutive iterations - stopping"417);418break;419}420} else {421consecutiveNoActions = 0;422}423} catch (error) {424console.error(`❌ Error during task execution: ${error}`);425throw error;426}427}428429if (iterations >= maxIterations) {430console.warn(431`⚠️ Task execution stopped after ${maxIterations} iterations`432);433}434435const assistantMessages = newItems.filter(436(item) => item.role === "assistant"437);438const finalMessage = assistantMessages[assistantMessages.length - 1];439440return (441finalMessage?.content?.[0]?.text ||442"Task execution completed (no final message)"443);444}445}
Step 3: Create the Main Script
1import { Agent } from "./agent";2import { STEEL_API_KEY, OPENAI_API_KEY, TASK } from "./helpers";34async function main(): Promise<void> {5console.log("🚀 Steel + OpenAI Computer Use Assistant");6console.log("=".repeat(60));78if (STEEL_API_KEY === "your-steel-api-key-here") {9console.warn(10"⚠️ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"11);12console.warn(13" Get your API key at: https://app.steel.dev/settings/api-keys"14);15throw new Error("Set STEEL_API_KEY");16}1718if (OPENAI_API_KEY === "your-openai-api-key-here") {19console.warn(20"⚠️ WARNING: Please replace 'your-openai-api-key-here' with your actual OpenAI API key"21);22console.warn(" Get your API key at: https://platform.openai.com/");23throw new Error("Set OPENAI_API_KEY");24}2526console.log("\nStarting Steel session...");27const agent = new Agent();2829try {30await agent.initialize();31console.log("✅ Steel session started!");3233const startTime = Date.now();34const result = await agent.executeTask(TASK, true, false, 50);35const duration = ((Date.now() - startTime) / 1000).toFixed(1);3637console.log("\n" + "=".repeat(60));38console.log("🎉 TASK EXECUTION COMPLETED");39console.log("=".repeat(60));40console.log(`⏱️ Duration: ${duration} seconds`);41console.log(`🎯 Task: ${TASK}`);42console.log(`📋 Result:\n${result}`);43console.log("=".repeat(60));44} catch (error) {45console.log(`❌ Failed to run: ${error}`);46throw error;47} finally {48await agent.cleanup();49}50}5152main()53.then(() => {54process.exit(0);55})56.catch((error) => {57console.error("Task execution failed:", error);58process.exit(1);59});
Running Your Agent
Execute your script to start an interactive AI browser session:
npx ts-node main.ts
The agent will execute the task defined in the TASK environment variable or the default task. You can modify the task by setting the environment variable:
export TASK="Research the top 5 electric vehicles with the longest range"npx ts-node main.ts
You'll see each action the agent takes displayed in the console, and you can view the live browser session by opening the session URL in your web browser.
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the OpenAI documentation for more information about the computer-use-preview model
-
Add additional features like session recording or multi-session management