Quickstart (Typescript)
How to use Gemini Computer Use with Steel
This guide will walk you through how to use Google's gemini-2.5-computer-use-preview model with Steel's Computer API to create AI agents that can navigate the web.
Gemini's Computer Use model uses a normalized coordinate system (0-1000) and provides built-in actions for browser control, making it straightforward to integrate with Steel.
Prerequisites
-
Node.js 20+
-
A Steel API key (sign up here)
-
A Gemini API key (get one here)
Step 1: Setup and Helper Functions
First, create a project directory and install the required packages:
# Create a project directorymkdir steel-gemini-computer-usecd steel-gemini-computer-use# Initialize package.jsonnpm init -y# Install required packagesnpm install steel-sdk @google/genai dotenvnpm install -D @types/node typescript ts-node
Create a .env file with your API keys:
1STEEL_API_KEY=your_steel_api_key_here2GEMINI_API_KEY=your_gemini_api_key_here3TASK=Go to Steel.dev and find the latest news
Create a file with helper functions, constants, and type definitions:
1import * as dotenv from "dotenv";2import { Steel } from "steel-sdk";3import {4GoogleGenAI,5Content,6Part,7FunctionCall,8FunctionResponse,9Tool,10Environment,11GenerateContentConfig,12GenerateContentResponse,13Candidate,14FinishReason,15} from "@google/genai";1617dotenv.config();1819export const STEEL_API_KEY = process.env.STEEL_API_KEY || "your-steel-api-key-here";20export const GEMINI_API_KEY = process.env.GEMINI_API_KEY || "your-gemini-api-key-here";21export const TASK = process.env.TASK || "Go to Steel.dev and find the latest news";2223export const MODEL = "gemini-2.5-computer-use-preview-10-2025";24export const MAX_COORDINATE = 1000;2526export function formatToday(): string {27return new Intl.DateTimeFormat("en-US", {28weekday: "long",29month: "long",30day: "2-digit",31year: "numeric",32}).format(new Date());33}3435export const BROWSER_SYSTEM_PROMPT = `<BROWSER_ENV>36- You control a headful Chromium browser running in a VM with internet access.37- Chromium is already open; interact only through computer use actions (mouse, keyboard, scroll, screenshots).38- Today's date is ${formatToday()}.39</BROWSER_ENV>4041<BROWSER_CONTROL>42- When viewing pages, zoom out or scroll so all relevant content is visible.43- When typing into any input:44* Clear it first with Ctrl+A, then Delete.45* After submitting (pressing Enter or clicking a button), wait for the page to load.46- Computer tool calls are slow; batch related actions into a single call whenever possible.47- You may act on the user's behalf on sites where they are already authenticated.48- Assume any required authentication/Auth Contexts are already configured before the task starts.49- If the first screenshot is black:50* Click near the center of the screen.51* Take another screenshot.52</BROWSER_CONTROL>5354<TASK_EXECUTION>55- You receive exactly one natural-language task and no further user feedback.56- Do not ask the user clarifying questions; instead, make reasonable assumptions and proceed.57- For complex tasks, quickly plan a short, ordered sequence of steps before acting.58- Prefer minimal, high-signal actions that move directly toward the goal.59- Keep your final response concise and focused on fulfilling the task (e.g., a brief summary of findings or results).60</TASK_EXECUTION>`;6162export type Coordinates = [number, number];6364export interface ActionResult {65screenshotBase64: string;66url?: string;67}6869export {70Steel,71GoogleGenAI,72Content,73Part,74FunctionCall,75FunctionResponse,76Tool,77Environment,78GenerateContentConfig,79Candidate,80FinishReason,81};
Step 2: Create the Agent Class
1import {2Steel,3GoogleGenAI,4Content,5Part,6FunctionCall,7FunctionResponse,8Tool,9Environment,10GenerateContentConfig,11Candidate,12FinishReason,13STEEL_API_KEY,14GEMINI_API_KEY,15MODEL,16MAX_COORDINATE,17BROWSER_SYSTEM_PROMPT,18Coordinates,19ActionResult,20} from "./helpers";2122export class Agent {23private client: GoogleGenAI;24private steel: Steel;25private session: Steel.Session | null = null;26private contents: Content[];27private tools: Tool[];28private config: GenerateContentConfig;29private viewportWidth: number;30private viewportHeight: number;31private currentUrl: string;3233constructor() {34this.client = new GoogleGenAI({ apiKey: GEMINI_API_KEY });35this.steel = new Steel({ steelAPIKey: STEEL_API_KEY });36this.contents = [];37this.currentUrl = "about:blank";38this.viewportWidth = 1280;39this.viewportHeight = 768;40this.tools = [41{42computerUse: {43environment: Environment.ENVIRONMENT_BROWSER,44},45},46];47this.config = {48tools: this.tools,49};50}5152private denormalizeX(x: number): number {53return Math.round((x / MAX_COORDINATE) * this.viewportWidth);54}5556private denormalizeY(y: number): number {57return Math.round((y / MAX_COORDINATE) * this.viewportHeight);58}5960private center(): Coordinates {61return [62Math.floor(this.viewportWidth / 2),63Math.floor(this.viewportHeight / 2),64];65}6667private normalizeKey(key: string): string {68if (!key) return key;69const k = key.trim();70const upper = k.toUpperCase();71const synonyms: Record<string, string> = {72ENTER: "Enter",73RETURN: "Enter",74ESC: "Escape",75ESCAPE: "Escape",76TAB: "Tab",77BACKSPACE: "Backspace",78DELETE: "Delete",79SPACE: "Space",80CTRL: "Control",81CONTROL: "Control",82ALT: "Alt",83SHIFT: "Shift",84META: "Meta",85CMD: "Meta",86UP: "ArrowUp",87DOWN: "ArrowDown",88LEFT: "ArrowLeft",89RIGHT: "ArrowRight",90HOME: "Home",91END: "End",92PAGEUP: "PageUp",93PAGEDOWN: "PageDown",94};95if (upper in synonyms) return synonyms[upper];96if (upper.startsWith("F") && /^\d+$/.test(upper.slice(1))) {97return "F" + upper.slice(1);98}99return k;100}101102private normalizeKeys(keys: string[]): string[] {103return keys.map((k) => this.normalizeKey(k));104}105106async initialize(): Promise<void> {107this.session = await this.steel.sessions.create({108dimensions: { width: this.viewportWidth, height: this.viewportHeight },109blockAds: true,110timeout: 900000,111});112console.log("Steel Session created successfully!");113console.log(`View live session at: ${this.session.sessionViewerUrl}`);114}115116async cleanup(): Promise<void> {117if (this.session) {118console.log("Releasing Steel session...");119await this.steel.sessions.release(this.session.id);120console.log(121`Session completed. View replay at ${this.session.sessionViewerUrl}`122);123this.session = null;124}125}126127private async takeScreenshot(): Promise<string> {128const resp: any = await this.steel.sessions.computer(this.session!.id, {129action: "take_screenshot",130});131const img = resp?.base64_image;132if (!img) throw new Error("No screenshot returned from Steel");133return img;134}135136private async executeComputerAction(137functionCall: FunctionCall138): Promise<ActionResult> {139const name = functionCall.name ?? "";140const args = (functionCall.args ?? {}) as Record<string, unknown>;141142switch (name) {143case "open_web_browser": {144const screenshot = await this.takeScreenshot();145return { screenshotBase64: screenshot, url: this.currentUrl };146}147148case "click_at": {149const x = this.denormalizeX(args.x as number);150const y = this.denormalizeY(args.y as number);151const resp: any = await this.steel.sessions.computer(this.session!.id, {152action: "click_mouse",153button: "left",154coordinates: [x, y],155screenshot: true,156});157return {158screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),159url: this.currentUrl,160};161}162163case "hover_at": {164const x = this.denormalizeX(args.x as number);165const y = this.denormalizeY(args.y as number);166const resp: any = await this.steel.sessions.computer(this.session!.id, {167action: "move_mouse",168coordinates: [x, y],169screenshot: true,170});171return {172screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),173url: this.currentUrl,174};175}176177case "type_text_at": {178const x = this.denormalizeX(args.x as number);179const y = this.denormalizeY(args.y as number);180const text = args.text as string;181const pressEnter = args.press_enter !== false;182const clearBeforeTyping = args.clear_before_typing !== false;183184await this.steel.sessions.computer(this.session!.id, {185action: "click_mouse",186button: "left",187coordinates: [x, y],188});189190if (clearBeforeTyping) {191await this.steel.sessions.computer(this.session!.id, {192action: "press_key",193keys: ["Control", "a"],194});195await this.steel.sessions.computer(this.session!.id, {196action: "press_key",197keys: ["Backspace"],198});199}200201await this.steel.sessions.computer(this.session!.id, {202action: "type_text",203text: text,204});205206if (pressEnter) {207await this.steel.sessions.computer(this.session!.id, {208action: "press_key",209keys: ["Enter"],210});211}212213await this.steel.sessions.computer(this.session!.id, {214action: "wait",215duration: 1,216});217218const screenshot = await this.takeScreenshot();219return { screenshotBase64: screenshot, url: this.currentUrl };220}221222case "scroll_document": {223const direction = args.direction as string;224let keys: string[];225226if (direction === "down") {227keys = ["PageDown"];228} else if (direction === "up") {229keys = ["PageUp"];230} else if (direction === "left" || direction === "right") {231const [cx, cy] = this.center();232const delta = direction === "left" ? -400 : 400;233const resp: any = await this.steel.sessions.computer(this.session!.id, {234action: "scroll",235coordinates: [cx, cy],236delta_x: delta,237delta_y: 0,238screenshot: true,239});240return {241screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),242url: this.currentUrl,243};244} else {245keys = ["PageDown"];246}247248const resp: any = await this.steel.sessions.computer(this.session!.id, {249action: "press_key",250keys: keys,251screenshot: true,252});253return {254screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),255url: this.currentUrl,256};257}258259case "scroll_at": {260const x = this.denormalizeX(args.x as number);261const y = this.denormalizeY(args.y as number);262const direction = args.direction as string;263const magnitude = this.denormalizeY((args.magnitude as number) ?? 800);264265let deltaX = 0;266let deltaY = 0;267268if (direction === "down") deltaY = magnitude;269else if (direction === "up") deltaY = -magnitude;270else if (direction === "right") deltaX = magnitude;271else if (direction === "left") deltaX = -magnitude;272273const resp: any = await this.steel.sessions.computer(this.session!.id, {274action: "scroll",275coordinates: [x, y],276delta_x: deltaX,277delta_y: deltaY,278screenshot: true,279});280return {281screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),282url: this.currentUrl,283};284}285286case "wait_5_seconds": {287const resp: any = await this.steel.sessions.computer(this.session!.id, {288action: "wait",289duration: 5,290screenshot: true,291});292return {293screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),294url: this.currentUrl,295};296}297298case "go_back": {299const resp: any = await this.steel.sessions.computer(this.session!.id, {300action: "press_key",301keys: ["Alt", "ArrowLeft"],302screenshot: true,303});304return {305screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),306url: this.currentUrl,307};308}309310case "go_forward": {311const resp: any = await this.steel.sessions.computer(this.session!.id, {312action: "press_key",313keys: ["Alt", "ArrowRight"],314screenshot: true,315});316return {317screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),318url: this.currentUrl,319};320}321322case "navigate": {323let url = args.url as string;324if (!url.startsWith("http://") && !url.startsWith("https://")) {325url = "https://" + url;326}327328await this.steel.sessions.computer(this.session!.id, {329action: "press_key",330keys: ["Control", "l"],331});332await this.steel.sessions.computer(this.session!.id, {333action: "type_text",334text: url,335});336await this.steel.sessions.computer(this.session!.id, {337action: "press_key",338keys: ["Enter"],339});340await this.steel.sessions.computer(this.session!.id, {341action: "wait",342duration: 2,343});344345this.currentUrl = url;346const screenshot = await this.takeScreenshot();347return { screenshotBase64: screenshot, url: this.currentUrl };348}349350case "key_combination": {351const keysStr = args.keys as string;352const keys = keysStr.split("+").map((k) => k.trim());353const normalizedKeys = this.normalizeKeys(keys);354355const resp: any = await this.steel.sessions.computer(this.session!.id, {356action: "press_key",357keys: normalizedKeys,358screenshot: true,359});360return {361screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),362url: this.currentUrl,363};364}365366case "drag_and_drop": {367const startX = this.denormalizeX(args.x as number);368const startY = this.denormalizeY(args.y as number);369const endX = this.denormalizeX(args.destination_x as number);370const endY = this.denormalizeY(args.destination_y as number);371372const resp: any = await this.steel.sessions.computer(this.session!.id, {373action: "drag_mouse",374path: [375[startX, startY],376[endX, endY],377],378screenshot: true,379});380return {381screenshotBase64: resp?.base64_image || (await this.takeScreenshot()),382url: this.currentUrl,383};384}385386default: {387console.log(`Unknown action: ${name}, taking screenshot`);388const screenshot = await this.takeScreenshot();389return { screenshotBase64: screenshot, url: this.currentUrl };390}391}392}393394private extractFunctionCalls(candidate: Candidate): FunctionCall[] {395const functionCalls: FunctionCall[] = [];396if (!candidate.content?.parts) return functionCalls;397398for (const part of candidate.content.parts) {399if (part.functionCall) {400functionCalls.push(part.functionCall);401}402}403return functionCalls;404}405406private extractText(candidate: Candidate): string {407if (!candidate.content?.parts) return "";408const texts: string[] = [];409for (const part of candidate.content.parts) {410if (part.text) {411texts.push(part.text);412}413}414return texts.join(" ").trim();415}416417private buildFunctionResponseParts(418functionCalls: FunctionCall[],419results: ActionResult[]420): Part[] {421const parts: Part[] = [];422423for (let i = 0; i < functionCalls.length; i++) {424const fc = functionCalls[i];425const result = results[i];426427const functionResponse: FunctionResponse = {428name: fc.name ?? "",429response: { url: result.url ?? this.currentUrl },430};431432parts.push({ functionResponse });433parts.push({434inlineData: {435mimeType: "image/png",436data: result.screenshotBase64,437},438});439}440441return parts;442}443444async executeTask(445task: string,446printSteps: boolean = true,447maxIterations: number = 50448): Promise<string> {449this.contents = [450{451role: "user",452parts: [{ text: BROWSER_SYSTEM_PROMPT }, { text: task }],453},454];455456let iterations = 0;457let consecutiveNoActions = 0;458459console.log(`๐ฏ Executing task: ${task}`);460console.log("=".repeat(60));461462while (iterations < maxIterations) {463iterations++;464465try {466const response = await this.client.models.generateContent({467model: MODEL,468contents: this.contents,469config: this.config,470});471472if (!response.candidates || response.candidates.length === 0) {473console.log("โ No candidates in response");474break;475}476477const candidate = response.candidates[0];478479if (candidate.content) {480this.contents.push(candidate.content);481}482483const reasoning = this.extractText(candidate);484const functionCalls = this.extractFunctionCalls(candidate);485486if (487!functionCalls.length &&488!reasoning &&489candidate.finishReason === FinishReason.MALFORMED_FUNCTION_CALL490) {491console.log("โ ๏ธ Malformed function call, retrying...");492continue;493}494495if (!functionCalls.length) {496if (reasoning) {497if (printSteps) {498console.log(`\n๐ฌ ${reasoning}`);499}500console.log("โ Task complete - model provided final response");501break;502}503504consecutiveNoActions++;505if (consecutiveNoActions >= 3) {506console.log(507"โ ๏ธ No actions for 3 consecutive iterations - stopping"508);509break;510}511continue;512}513514consecutiveNoActions = 0;515516if (printSteps && reasoning) {517console.log(`\n๐ญ ${reasoning}`);518}519520const results: ActionResult[] = [];521522for (const fc of functionCalls) {523const actionName = fc.name ?? "unknown";524const actionArgs = fc.args ?? {};525526if (printSteps) {527console.log(`๐ง ${actionName}(${JSON.stringify(actionArgs)})`);528}529530const result = await this.executeComputerAction(fc);531results.push(result);532}533534const functionResponseParts = this.buildFunctionResponseParts(535functionCalls,536results537);538539this.contents.push({540role: "user",541parts: functionResponseParts,542});543} catch (error) {544console.error(`โ Error during task execution: ${error}`);545throw error;546}547}548549if (iterations >= maxIterations) {550console.warn(551`โ ๏ธ Task execution stopped after ${maxIterations} iterations`552);553}554555for (let i = this.contents.length - 1; i >= 0; i--) {556const content = this.contents[i];557if (content.role === "model") {558const text = content.parts559?.filter((p) => p.text)560.map((p) => p.text)561.join(" ")562.trim();563if (text) {564return text;565}566}567}568569return "Task execution completed (no final message)";570}571}
Step 3: Create the Main Script
1import { Agent } from "./agent";2import { STEEL_API_KEY, GEMINI_API_KEY, TASK } from "./helpers";34async function main(): Promise<void> {5console.log("๐ Steel + Gemini Computer Use Assistant");6console.log("=".repeat(60));78if (STEEL_API_KEY === "your-steel-api-key-here") {9console.warn(10"โ ๏ธ WARNING: Please replace 'your-steel-api-key-here' with your actual Steel API key"11);12console.warn(13" Get your API key at: https://app.steel.dev/settings/api-keys"14);15throw new Error("Set STEEL_API_KEY");16}1718if (GEMINI_API_KEY === "your-gemini-api-key-here") {19console.warn(20"โ ๏ธ WARNING: Please replace 'your-gemini-api-key-here' with your actual Gemini API key"21);22console.warn(" Get your API key at: https://aistudio.google.com/apikey");23throw new Error("Set GEMINI_API_KEY");24}2526console.log("\nStarting Steel session...");27const agent = new Agent();2829try {30await agent.initialize();31console.log("โ Steel session started!");3233const startTime = Date.now();34const result = await agent.executeTask(TASK, true, 50);35const duration = ((Date.now() - startTime) / 1000).toFixed(1);3637console.log("\n" + "=".repeat(60));38console.log("๐ TASK EXECUTION COMPLETED");39console.log("=".repeat(60));40console.log(`โฑ๏ธ Duration: ${duration} seconds`);41console.log(`๐ฏ Task: ${TASK}`);42console.log(`๐ Result:\n${result}`);43console.log("=".repeat(60));44} catch (error) {45console.log(`โ Failed to run: ${error}`);46throw error;47} finally {48await agent.cleanup();49}50}5152main()53.then(() => {54process.exit(0);55})56.catch((error) => {57console.error("Task execution failed:", error);58process.exit(1);59});
Running Your Agent
Execute your script to start an interactive AI browser session:
npx ts-node main.ts
The agent will execute the task defined in the TASK environment variable or the default task. You can modify the task by setting the environment variable:
export TASK="Research the latest developments in AI"npx ts-node main.ts
You'll see each action the agent takes displayed in the console, and you can view the live browser session by opening the session URL in your web browser.
Understanding Gemini's Coordinate System
Gemini's Computer Use model uses a normalized coordinate system where both X and Y coordinates range from 0 to 1000. The agent automatically converts these to actual pixel coordinates based on the viewport size (1280x768 by default).
Next Steps
-
Explore the Steel API documentation for more advanced features
-
Check out the Gemini Computer Use documentation for more information about the model
-
Add additional features like session recording or multi-session management