Overview
OpenAI's Computer Use is an agent that combines vision capabilities with advanced reasoning to control computer interfaces and perform tasks on behalf of users through a continuous action loop.
OverviewCopied!
The OpenAI Computer Use integration allows you to connect GPT-4o's vision and reasoning capabilities with Steel's reliable browser infrastructure. This integration enables AI agents to:
-
Control Steel browser sessions via the OpenAI Responses API
-
Execute real browser actions like clicking, typing, and scrolling
-
Perform complex web tasks such as form filling, searching, and navigation
-
Process visual feedback from screenshots to determine next actions
-
Implement human-in-the-loop verification for sensitive operations
By combining OpenAI's Computer Use with Steel's cloud browser infrastructure, you can build robust, scalable web automation solutions that leverage Steel's anti-bot capabilities, proxy management, and sandboxed environments.
Requirements & LimitationsCopied!
-
OpenAI API Key: Access to the OpenAI API with the computer-use-preview model
-
Steel API Key: Active subscription to Steel
-
Python Environment: Support for Python API clients for both services
-
Supported Environments: Works best with Steel's browser environment (vs. desktop environments)
DocumentationCopied!
Quickstart Guide (Python) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Python.
Quickstart Guide (Node) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Typescript & Node.
Additional ResourcesCopied!
-
OpenAI Computer Use Documentation - Official documentation from OpenAI
-
Steel Sessions API Reference - Technical details for managing Steel browser sessions
-
Cookbook Recipe (Python) - Working, forkable examples of the integration in Python
-
Cookbook Recipe (TS/Node) - Working, forkable examples of the integration in Python
-
Community Discord - Get help and share your implementations