# Build a browser agent with Google ADK URL: /cookbook/google-adk --- title: Build a browser agent with Google ADK description: "Use Steel with Google's Agent Development Kit (ADK) for Go to build a tool-using browser agent that drives a chromedp session over CDP and reads Hacker News." --- [Google ADK](https://adk.dev/) (`@google/adk`) is Google's Agent Development Kit. You build an `LlmAgent` with a Gemini model, `instruction`, and a list of `FunctionTool`s, then hand it to a `Runner`. The runner owns the loop: it appends your message to a session, calls the model, dispatches tool calls, feeds results back, and yields an async stream of `Event`s until the agent produces its final answer. This recipe wires that tool layer to a Steel cloud browser. Three `FunctionTool`s in `index.ts` (`navigate`, `snapshot`, `extract`) drive a single Playwright page over CDP. The Steel session opens once in `main()` before the runner starts, so the tools close over a live `page` rather than spinning up a browser per call. Demo task: read the front page of Hacker News and return the top 5 stories as JSON. ```typescript const agent = new LlmAgent({ name: "steel_research", model: new Gemini({ model: "gemini-2.5-flash", apiKey: GOOGLE_API_KEY }), instruction: "You operate a Steel cloud browser via tools. Workflow: navigate, snapshot, extract. ...", tools: [navigate, snapshot, extract], }); const runner = new InMemoryRunner({ agent }); // runTask() wraps this loop in a fresh session and retries up to three times // when a turn ends in MALFORMED_FUNCTION_CALL or an empty answer. for await (const event of runner.runAsync({ userId, sessionId, newMessage })) { if (event.errorCode) break; // transient: caught by runTask, retried if (isFinalResponse(event)) finalText = stringifyContent(event).trim(); } ``` The model is built as an explicit `Gemini` instance so the key comes from `GOOGLE_API_KEY`. ADK's bare-string model path (`model: "gemini-2.5-flash"`) only resolves `GOOGLE_GENAI_API_KEY` or `GEMINI_API_KEY` from the environment, so passing `apiKey` directly keeps the one variable name consistent with the rest of the cookbook. ## Run it ```bash cd examples/google-adk-ts cp .env.example .env # set STEEL_API_KEY and GOOGLE_API_KEY npm install npm start ``` Get keys at [app.steel.dev/settings/api-keys](https://app.steel.dev/settings/api-keys) and [aistudio.google.com/apikey](https://aistudio.google.com/apikey). `main()` prints a Live View URL right after the session opens; open it in another tab to watch the page as the agent navigates and scrapes. Each tool logs its own latency, and the event loop logs a `step:` line whenever the model emits a tool call, so you can read the agent's progress as it happens. Your output varies. Structure looks like this: ```text Steel + Google ADK Starter ============================================================ open-session: 1380ms Live View: https://app.steel.dev/sessions/ab12cd34... step: navigate navigate: 690ms step: snapshot snapshot: 410ms (3820 chars, 120 links) step: extract extract: 95ms (5 rows) Agent finished. Top stories: { "stories": [ { "rank": 1, "title": "Show HN: ...", "url": "https://...", "points": 412 } ] } Releasing Steel session... Session released. Replay: https://app.steel.dev/sessions/ab12cd34... ``` A full run takes ~15-30 seconds and a few cents of Steel session time plus Gemini tokens. The `finally` block calls `steel.sessions.release()`; skip it and the session keeps billing until the default 5-minute timeout. ## How the loop reads `runAsync` is an async generator, not a callback. Every `for await` iteration hands you one `Event`: a tool call the model wants to make, the tool's result coming back, a chunk of the model's reasoning, or the final answer. Two helpers from `@google/adk` keep the consumer thin: - `isFinalResponse(event)` is true on the last event of the turn. That is the cue to capture the answer. - `stringifyContent(event)` flattens an event's `content.parts` into a single string, so you do not walk the parts array by hand. The `step:` log reads `event.content.parts` for `functionCall.name`. That is the only place the recipe inspects raw event parts; everything else leans on the two helpers. ADK logs one INFO line per event by default; `setLogLevel(LogLevel.WARN)` at startup keeps the console to the agent's own output. `runTask` runs that loop inside a fresh session and watches `event.errorCode`. gemini-2.5-flash occasionally ends a turn with `MALFORMED_FUNCTION_CALL` or an empty answer, so the helper retries up to three times before giving up rather than failing the whole run. One Gemini wrinkle shapes the tools: its function-declaration schema rejects numeric bounds (`exclusiveMinimum`, `maximum`) and `default`, so each tool keeps its `parameters` to plain types and applies caps and defaults inside `execute`. A `.positive()` or `.default()` left on a Zod field surfaces as a 400 from the model call. This agent has no `outputSchema`. ADK disables tool calls when an output schema is set on an `LlmAgent`, and this agent needs its tools through the whole turn, so the prompt asks for bare JSON instead and `main()` parses the final text (stripping a stray ```json fence if the model adds one). For a turn that does not call tools, set `outputSchema` on the agent for validated typed output. ## Make it yours - **Swap the model.** Change the `model` string passed to `Gemini`. `"gemini-2.5-pro"`, `"gemini-flash-latest"`, and other Gemini IDs all work with the same `GOOGLE_API_KEY`. - **Swap the task.** Edit `TASK` and the JSON shape named in the agent's `instruction`. The three tools are task-agnostic; they describe a generic navigate-then-scrape flow. - **Add a tool.** A `click` tool wrapping `page.click`, or a `screenshot` tool returning a base64 PNG. Build it with `new FunctionTool({ name, description, parameters, execute })` and add it to the agent's `tools` array. - **Persist sessions.** Swap `InMemoryRunner` for a `Runner` with a `DatabaseSessionService` to keep conversation state across runs; the session ID is the thread key. - **Run Vertex instead of AI Studio.** Set `GOOGLE_GENAI_USE_VERTEXAI=TRUE` plus `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION`, and construct `new Gemini({ model, vertexai: true })`. - **Turn on stealth.** Pass `useProxy`, `solveCaptcha`, or `sessionTimeout` to `steel.sessions.create({...})` for sites with anti-bot. ## Related [Mastra version](/cookbook/mastra) · [OpenAI Agents SDK version](/cookbook/openai-agents) · [ADK TypeScript docs](https://adk.dev/get-started/typescript/) [Google ADK](https://google.github.io/adk-docs/) is Google's Agent Development Kit. An `LlmAgent` holds the model, instruction, and tools; a `Runner` drives the turn loop against a session service that stores conversation state. This starter binds a Steel cloud browser to three function tools, hands them to a Gemini agent, and points it at Hacker News. The pieces ADK asks you to assemble: ```python from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService agent = LlmAgent( name="hn_scraper", model="gemini-2.5-flash", tools=[navigate, snapshot, extract], output_schema=TopStories, instruction="You operate a Steel cloud browser via tools. ...", ) session_service = InMemorySessionService() adk_session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID) runner = Runner(agent=agent, app_name=APP_NAME, session_service=session_service) ``` Note the two session concepts that share a word. There is the Steel session (a remote browser, billed per minute) and the ADK session (a conversation record, held in memory here). They are unrelated objects; `main` creates one of each. `run_agent` sends a turn and reads the result. `runner.run_async` returns an async generator of events: tool calls, tool results, model deltas, and finally one event where `event.is_final_response()` is true. You iterate, keep the text from that final event, and ignore the rest: ```python message = types.Content(role="user", parts=[types.Part(text=prompt)]) async for event in runner.run_async( user_id=USER_ID, session_id=session_id, new_message=message ): if event.is_final_response() and event.content and event.content.parts: final = event.content.parts[0].text or "" ``` ## Tools ADK builds each tool's JSON schema from the Python function itself: parameter names and type hints become the arguments, and the docstring (summary plus `Args:` lines) becomes the descriptions the model reads. So the tools are plain `async def` functions with typed parameters and a Google-style docstring, no decorator: ```python async def navigate(url: str) -> dict: """Navigate the open browser session to a URL and wait for it to load. Args: url: The absolute URL to open. Returns: A dict with the resolved url and page title. """ await _PAGE.goto(url, wait_until="domcontentloaded", timeout=45_000) return {"url": _PAGE.url, "title": await _PAGE.title()} ``` A function tool in ADK takes no framework context argument, so the live Playwright `Page` is bound to a module-level `_PAGE` and the tools close over it. `main` sets `_PAGE` once the CDP connection is up, before the runner starts. The three tools: - `navigate(url)` loads a page and reports the resolved URL and title. - `snapshot(max_chars, max_links)` returns capped visible text plus a list of links, so the agent reads the page before guessing selectors. - `extract(row_selector, fields, limit)` runs one `page.evaluate` that maps a CSS row selector and field specs to structured rows. One round trip, not one per cell. CDP calls to Steel's cloud browser run ~200 to 300ms each, so a per-cell loop would burn seconds. Each tool prints its own latency (`navigate: 412ms`) so you can see where a turn spends its time. ## Typed output `output_schema=TopStories` ties the final reply to a Pydantic model. ADK keeps the tools available during the thinking loop and constrains only the last message, so the agent still browses freely and then answers in shape. The final event text is JSON that already validates against `TopStories`; `main` parses and re-dumps it with indentation: ```python class Story(BaseModel): rank: int title: str url: str = Field(description="Destination URL the story links to.") points: int class TopStories(BaseModel): stories: list[Story] = Field(min_length=1, max_length=5) ``` ## Run it ```bash cd examples/google-adk-py cp .env.example .env # set STEEL_API_KEY and GOOGLE_API_KEY uv run main.py ``` Get a Steel key from [app.steel.dev](https://app.steel.dev/settings/api-keys) and a Gemini key from [aistudio.google.com](https://aistudio.google.com/apikey). `GOOGLE_GENAI_USE_VERTEXAI=FALSE` keeps ADK on the AI Studio key path instead of trying to authenticate against a GCP project; `main` defaults it for you if it is unset. Your output varies. Structure looks like this: ```text Steel + Google ADK Starter ============================================================ Session: https://app.steel.dev/sessions/ab12cd34... navigate: 1612ms snapshot: 487ms (3821 chars, 48 links) extract: 394ms (5 rows) Agent finished. { "stories": [ { "rank": 1, "title": "Show HN: ...", "url": "https://example.com/...", "points": 412 }, ... ] } Releasing Steel session... Session released. Replay: https://app.steel.dev/sessions/ab12cd34... ``` A run takes ~20 to 40 seconds and a handful of agent turns on Hacker News. Cost is a few cents of Steel session time plus Gemini tokens. The `finally` block in `main` closes Playwright and calls `steel.sessions.release()` so Steel stops billing per minute. ## Make it yours - **Swap the model.** Change `MODEL`. Any Gemini that ADK reaches through the same API key works without code changes, since the tool schemas are generated from the functions. Heavier reasoning models trade latency for fewer wrong turns. - **Swap the task.** Edit the prompt passed to `run_agent` and the `TopStories` / `Story` models. The tools stay the same; the agent re-plans against the new shape. - **Add a tool.** Write another `async def` with type hints and a docstring, then append it to `tools=[...]`. A useful fourth is `click(selector: str)` that calls `page.click` and waits for navigation. - **Carry state across turns.** The `InMemorySessionService` keeps history under one `session_id`, so calling `run_agent` again with the same id continues the conversation. Swap in a `DatabaseSessionService` to persist it. - **Run more agents.** Build a Steel session and `_PAGE` per task and run them on separate ADK sessions. Since `_PAGE` is module-level here, give each concurrent run its own page object rather than sharing one. ## Related [Steel + Genkit (Go)](/cookbook/genkit) · [Steel + Pydantic AI (Python)](/cookbook/pydantic-ai) · [Google ADK Python documentation](https://google.github.io/adk-docs/) [Google ADK](https://adk.dev/get-started/go/) is Google's Agent Development Kit, a code-first toolkit for building agents in Go. The pieces fit together as a tree: a `model.LLM`, a set of `tool.Tool` values, and an `llmagent` that owns them, all driven by a `runner.Runner` that turns one user message into a stream of events. This starter hands that agent three tools backed by a Steel cloud browser and points a Gemini model at Hacker News. The runner is the part worth understanding first. You do not write the tool-calling loop. You hand `runner.New` a root agent and a session service, call `Run`, and range over the events it yields: ```go r, _ := runner.New(runner.Config{AppName: appName, Agent: a, SessionService: sessionService}) for event, err := range r.Run(ctx, userID, sessionID, task, agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }) { for _, part := range event.Content.Parts { if part.Text != "" { final = part.Text } } } ``` `Run` returns a Go 1.23 iterator (`iter.Seq2[*session.Event, error]`). Each event is one step: a model turn that requests a tool, the tool's result fed back in, the next model turn, and so on until the model answers without calling anything. Every event carries a `genai.Content`, so ranging over `event.Content.Parts` lets you watch text, function calls, and function responses flow past. The loop in `main` keeps the last non-empty text part; that is the agent's final answer. ## Tools from a Go function `functiontool.New` wraps a typed Go function as a tool. It is generic over the argument and result types and infers the JSON schema the model sees from your input struct by reflection: ```go navigate, _ := functiontool.New(functiontool.Config{ Name: "navigate", Description: "Open a URL in the live browser tab and wait for it to load.", }, func(tc agent.ToolContext, in navigateInput) (navigateOutput, error) { var title, url string err := chromedp.Run(b.tab, chromedp.Navigate(in.URL), chromedp.Title(&title), chromedp.Location(&url)) return navigateOutput{Title: title, URL: url}, err }) ``` The schema comes from struct tags. A `jsonschema` tag on a field becomes that argument's description in the tool declaration, which is how the model learns what `rowSelector` or `attr` mean: ```go type extractInput struct { RowSelector string `json:"rowSelector" jsonschema:"CSS selector matching each item, e.g. 'tr.athing'."` Fields []fieldSpec `json:"fields" jsonschema:"One entry per column to pull out of each row."` Limit int `json:"limit,omitempty" jsonschema:"Maximum number of rows to return. Defaults to 10."` } ``` The first argument to every handler is an `agent.ToolContext`. It embeds `context.Context`, so the `scrape` tool passes `tc` straight to `client.Scrape` as the request context. The handlers return ordinary Go structs and errors; ADK marshals the struct into the function response and an error becomes a tool failure the model can react to. Three tools cover the two access patterns a browsing agent needs: - `navigate` and `extract` drive one live chromedp tab attached to the Steel session over CDP. `extract` takes a row selector plus a field-per-column list and runs the whole pull inside a single `chromedp.Evaluate`. Serial CDP round-trips to a cloud browser run about 200 to 300 ms each, so collapsing N rows by M fields into one evaluate keeps a page read under a second instead of stacking dozens of trips. - `scrape` calls `client.Scrape` and returns clean Markdown for a URL without touching the tab. It is the reliable path when the agent just needs an article's text and sidesteps selector guesswork entirely. ## Run it ```bash cd examples/google-adk-go cp .env.example .env # set STEEL_API_KEY and GOOGLE_API_KEY go mod tidy go run . ``` Get keys from [app.steel.dev](https://app.steel.dev/settings/api-keys) and [Google AI Studio](https://aistudio.google.com/apikey). `GOOGLE_GENAI_USE_VERTEXAI=FALSE` in `.env.example` keeps the genai client on the AI Studio backend, so the API key alone is enough and no Vertex project is required. The program prints a session viewer URL as it starts; open it in another tab to watch the browser run live. Each tool call prints its latency. Your output varies. Structure looks like this: ```text Steel + Google ADK Go Starter ============================================================ Session: https://app.steel.dev/sessions/ab12cd34... navigate: 1183ms extract: 412ms (5 rows) Agent finished. { "stories": [ { "points": "342", "rank": 1, "title": "Show HN: ...", "url": "https://example.com/..." } ] } Releasing Steel session... Session released. Replay: https://app.steel.dev/sessions/ab12cd34... ``` A run takes about 20 to 40 seconds and a handful of model turns. Cost is a few cents of Steel session time plus Gemini tokens. The deferred cleanup in `main` releases the session: Steel bills per session-minute, so a leaked session keeps running until the default 5-minute timeout. ## Structured output ADK Go can pin an agent's reply to a `genai.Schema` through `OutputSchema` on the agent config, but setting it disables tools: an agent with an output schema can only reply, it cannot call functions. This agent needs its tools, so it returns JSON as text instead. The instruction asks for a bare JSON object, and `prettyJSON` in `main` strips a stray code fence if the model adds one, then re-indents the result. If you would rather have a typed value, split the work into two agents: a tool-using agent that gathers the rows and a second agent with `OutputSchema` set that formats them. ## Make it yours - **Swap the model.** Change `modelName`. Any Gemini model your key can reach works without code changes, for example `gemini-2.5-pro`. `gemini.NewModel` takes the name and a `genai.ClientConfig`. - **Swap the task.** Change the `task` content and the JSON shape named in the agent instruction. The tools stay the same; the agent re-plans against the new request. - **Add a tool.** Write a `func(agent.ToolContext, In) (Out, error)`, wrap it with `functiontool.New`, and add it to the agent's `Tools`. A useful fourth is `click(selector string)` that runs `chromedp.Click` and waits for navigation. - **Inspect the loop.** Range over more than text. Every event exposes `event.Content.Parts`, where `FunctionCall` and `FunctionResponse` parts let you log exactly which tool the agent reached for and what came back. ## Related [Steel + Genkit (Go)](/cookbook/genkit) and [Steel + Eino (Go)](/cookbook/eino) build the same agent shape in other Go frameworks. The [ADK Go quickstart](https://adk.dev/get-started/go/) covers agents, tools, and the runner in depth. ## Related recipes