Overview

Gemini's Computer Use is an agent that combines vision capabilities with advanced reasoning to control computer interfaces and perform tasks on behalf of users through a continuous action loop.

Overview

The Gemini Computer Use integration allows you to connect Gemini 2.5's vision and reasoning capabilities with Steel's reliable browser infrastructure. This integration enables AI agents to:

  • Control Steel browser sessions via the Gemini API

  • Execute real browser actions like clicking, typing, and scrolling

  • Perform complex web tasks such as form filling, searching, and navigation

  • Process visual feedback from screenshots to determine next actions

  • Handle normalized coordinate systems automatically

By combining Gemini's Computer Use with Steel's cloud browser infrastructure, you can build robust, scalable web automation solutions that leverage Steel's anti-bot capabilities, proxy management, and sandboxed environments.

Requirements & Limitations

  • Gemini API Key: Access to the Gemini API with the gemini-2.5-computer-use-preview model

  • Steel API Key: Active subscription to Steel

  • Python/Node Environment: Support for API clients for both services

  • Supported Environments: Works best with Steel's browser environment

Documentation

Quickstart Guide (Python) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Python.

Quickstart Guide (Node) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Typescript & Node.

Additional Resources