Overview

OpenAI's Computer Use is an agent that combines vision capabilities with advanced reasoning to control computer interfaces and perform tasks on behalf of users through a continuous action loop.

OverviewCopied!

The OpenAI Computer Use integration allows you to connect GPT-4o's vision and reasoning capabilities with Steel's reliable browser infrastructure. This integration enables AI agents to:

Control Steel browser sessions via the OpenAI Responses API
Execute real browser actions like clicking, typing, and scrolling
Perform complex web tasks such as form filling, searching, and navigation
Process visual feedback from screenshots to determine next actions
Implement human-in-the-loop verification for sensitive operations

By combining OpenAI's Computer Use with Steel's cloud browser infrastructure, you can build robust, scalable web automation solutions that leverage Steel's anti-bot capabilities, proxy management, and sandboxed environments.

Requirements & LimitationsCopied!

OpenAI API Key: Access to the OpenAI API with the computer-use-preview model
Steel API Key: Active subscription to Steel
Python Environment: Support for Python API clients for both services
Supported Environments: Works best with Steel's browser environment (vs. desktop environments)

DocumentationCopied!

Quickstart Guide (Python) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Python.

Quickstart Guide (Node) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Typescript & Node.

Additional ResourcesCopied!

OpenAI Computer Use Documentation - Official documentation from OpenAI
Steel Sessions API Reference - Technical details for managing Steel browser sessions
Cookbook Recipe (Python) - Working, forkable examples of the integration in Python
Cookbook Recipe (TS/Node) - Working, forkable examples of the integration in Python
Community Discord - Get help and share your implementations