Overview

OpenAI's Computer Use is an agent that combines vision capabilities with advanced reasoning to control computer interfaces and perform tasks on behalf of users through a continuous action loop.

OverviewCopied!

The OpenAI Computer Use integration allows you to connect GPT-4o's vision and reasoning capabilities with Steel's reliable browser infrastructure. This integration enables AI agents to:

  • Control Steel browser sessions via the OpenAI Responses API

  • Execute real browser actions like clicking, typing, and scrolling

  • Perform complex web tasks such as form filling, searching, and navigation

  • Process visual feedback from screenshots to determine next actions

  • Implement human-in-the-loop verification for sensitive operations

By combining OpenAI's Computer Use with Steel's cloud browser infrastructure, you can build robust, scalable web automation solutions that leverage Steel's anti-bot capabilities, proxy management, and sandboxed environments.

Requirements & LimitationsCopied!

  • OpenAI API Key: Access to the OpenAI API with the computer-use-preview model

  • Steel API Key: Active subscription to Steel

  • Python Environment: Support for Python API clients for both services

  • Supported Environments: Works best with Steel's browser environment (vs. desktop environments)

DocumentationCopied!

Quickstart Guide (Python) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Python.

Quickstart Guide (Node) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Typescript & Node.

Additional ResourcesCopied!