Overview

Gemini's Computer Use is an agent that combines vision capabilities with advanced reasoning to control computer interfaces and perform tasks on behalf of users through a continuous action loop.

Overview

The Gemini Computer Use integration allows you to connect Gemini 2.5's vision and reasoning capabilities with Steel's reliable browser infrastructure. This integration enables AI agents to:

Control Steel browser sessions via the Gemini API
Execute real browser actions like clicking, typing, and scrolling
Perform complex web tasks such as form filling, searching, and navigation
Process visual feedback from screenshots to determine next actions
Handle normalized coordinate systems automatically

By combining Gemini's Computer Use with Steel's cloud browser infrastructure, you can build robust, scalable web automation solutions that leverage Steel's anti-bot capabilities, proxy management, and sandboxed environments.

Requirements & Limitations

Gemini API Key: Access to the Gemini API with the gemini-2.5-computer-use-preview model
Steel API Key: Active subscription to Steel
Python/Node Environment: Support for API clients for both services
Supported Environments: Works best with Steel's browser environment

Documentation

Quickstart Guide (Python) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Python.

Quickstart Guide (Node) → Step-by-step guide to building a Simple CUA agent with Steel browser sessions in Typescript & Node.

Additional Resources

Gemini Computer Use Documentation - Official documentation from Google
Steel Sessions API Reference - Technical details for managing Steel browser sessions
Cookbook Recipe (Python) - Working, forkable examples of the integration in Python
Cookbook Recipe (TS/Node) - Working, forkable examples of the integration in TypeScript
Community Discord - Get help and share your implementations