` soup before the first sentence. Markdown collapses that to the text, the structure, and the links, so you spend tokens on content instead of tags. The wiring is small once you have the string: ```typescript const { content, metadata } = await client.scrape({ url: TARGET_URL, format: ["markdown"], }); const answer = await llm.chat({ messages: [ { role: "system", content: "Answer using only the page below." }, { role: "user", content: `# ${metadata.title}\n\n${content.markdown}` }, ], }); ``` That is the whole integration: scrape to Markdown, prepend the title, hand it to a model. No selectors, no `page.evaluate`, no waiting on a DOM you do not control. One failure mode to plan for: a heavily client-rendered page can return near-empty Markdown if the content paints after the initial load. When `content.markdown` comes back short for a site you know is rich, add `delay` (milliseconds) to the `scrape()` call so the page settles before capture. Check `metadata.statusCode` too. A scrape of a 403 or a soft-blocked page still succeeds at the HTTP level but hands you the block page's text, not the content you wanted. ## What you get back `format` is an array, so you can ask for more than one representation in a single call: `["markdown", "html", "cleaned_html", "readability"]`. Each lands under `content` on the response (`content.markdown`, `content.html`, and so on), and the field is undefined when you did not request that format, which is why the example reads `content.markdown ?? ""`. The response carries more than the body. `scraped.metadata` holds the page `title`, `description`, `statusCode`, Open Graph tags, and the canonical URL. `scraped.links` is a flat array of `{ text, url }` for every link on the page, handy when you want an LLM to pick a next page to visit. The example prints the status code, title, link count, and the first 500 characters of Markdown so you can see the shape without dumping a whole article to the terminal. `screenshot()` and `pdf()` differ from `scrape()` in one way worth knowing up front: they return a hosted URL, not bytes. `shot.url` and `pdf.url` point at the rendered artifact on Steel's storage, so the example logs the links rather than writing files. If you want the bytes on disk, fetch the URL yourself. The Python sibling does exactly that. ## Run it ```bash cd examples/scrape-ts cp .env.example .env # set STEEL_API_KEY npm install npm start ``` Get a key at [app.steel.dev/settings/api-keys](https://app.steel.dev/settings/api-keys). `TARGET_URL` in `.env` is optional and defaults to Hacker News. Your output varies. Structure looks like this: ```text Steel Scrape API (TypeScript) ============================================================ Scraping https://news.ycombinator.com to markdown... HTTP 200 | Hacker News Links found: 174 Markdown length: 6841 characters --- Markdown preview (first 500 chars) --- # Hacker News * [new](newest) * [past](front) * [comments](newcomments) * [ask](ask) * [show](show) ... --- end preview --- Capturing a full-page screenshot... Screenshot hosted at: https://steel-screenshots.s3.amazonaws.com/... Rendering the page to PDF... PDF hosted at: https://steel-screenshots.s3.amazonaws.com/... Done. Feed the markdown straight into an LLM prompt. ``` Each of the three calls is one billed request against Steel, so a full run costs a few cents of browser time. There is no session left open to leak: `scrape()`, `screenshot()`, and `pdf()` each return when the work is finished, so unlike the browser-driving recipes there is no `release()` to forget. ## Make it yours - **Pipe Markdown into a model.** Pass `markdown` as the user message to your LLM of choice and ask it to summarize the page or pull out structured fields. This is the whole reason to scrape to Markdown instead of HTML. - **Ask for several formats at once.** Set `format: ["markdown", "html"]` when you want the clean text for the model and the raw HTML for a fallback parser, both from a single request. - **Bundle artifacts into the scrape.** Instead of separate `screenshot()` and `pdf()` calls, pass `screenshot: true` and `pdf: true` to `scrape()`. The URLs come back on `scraped.screenshot` and `scraped.pdf`, which is one billed request instead of three. - **Get past anti-bot pages.** Add `useProxy: true` to route through Steel's residential proxies, or `delay: 3000` to wait for client-side rendering before the capture. - **Pick a region.** `region` accepts values like `"iad"` or `"fra"` to run the fetch closer to the target or to your users. ## Related [Python version](/cookbook/scrape) renders the same endpoints and writes the screenshot and PDF to disk as files. [Rust version](/cookbook/scrape) is the lowest-friction way into the Rust SDK. For a recipe that drives a real browser instead of the direct API, see [playwright-ts](/cookbook/playwright). Full method and parameter reference lives in the [steel-sdk package](https://www.npmjs.com/package/steel-sdk). Steel's `/v1/scrape` endpoint runs a browser server-side and hands back the rendered page. There is no session to create, no CDP socket to attach to, and no browser library on your machine. You call one method, and you get the page content, plus an optional screenshot and PDF. This recipe turns that single call into three files on disk: `page.md`, `screenshot.png`, and `page.pdf`. ```python result = client.scrape( url=TARGET_URL, format=["markdown"], screenshot=True, pdf=True, ) ``` The one detail worth internalizing: the response mixes inline data and hosted artifacts. `result.content.markdown` is a string you can write straight to a file. But `result.screenshot.url` and `result.pdf.url` are **hosted URLs**, not bytes. Steel renders the image and PDF, stores them, and returns links. So the recipe writes the markdown directly, then fetches the two URLs with `urllib` and saves the bytes. The `download` helper does the fetch; `main` wires the three writes. Because there is no session object, there is no teardown. `client.sessions.release(...)` does not apply here. You pay for the render, the response comes back, and you are done. That makes scrape the lowest-friction way to pull a page into an agent's context: one call, structured output, no lifecycle to manage. ## Run it ```bash cd examples/scrape-py cp .env.example .env # set STEEL_API_KEY uv run main.py ``` Grab a key at [app.steel.dev/settings/api-keys](https://app.steel.dev/settings/api-keys). `uv sync` runs automatically on first `uv run`, so there is no separate install step. Your output varies. Structure looks like this: ```text Steel Scrape API (Python) ============================================================ Scraping https://news.ycombinator.com ... Fetched "Hacker News" (HTTP 200) Markdown: 8421 chars, 147 links Saved page.md (8421 chars) Saved screenshot.png (184320 bytes) Saved page.pdf (96774 bytes) Artifacts written to /path/to/examples/scrape-py/output Done! ``` The three files land in `output/` next to `main.py`. Open `page.md` to see the markdown an LLM would read, `screenshot.png` for the rendered viewport, and `page.pdf` for a print-layout capture. A scrape costs a few cents of browser time. You are billed per render, not per minute, so a one-shot scrape is cheaper than spinning up a full session for the same page. If you only need text, drop `screenshot=True` and `pdf=True` and you skip the render-and-host work for the artifacts you are not using. ## Make it yours - **Change the target.** Set `TARGET_URL` in `.env`, or edit the default in `main.py`. Everything downstream is the same. - **Pick your formats.** `format` accepts any of `markdown`, `html`, `cleaned_html`, and `readability`. Pass a list to get several at once, then read them off `result.content` (`result.content.html`, `result.content.cleaned_html`, and so on). `cleaned_html` strips scripts and boilerplate; `readability` returns article-extracted structure. - **Mine the metadata.** `result.metadata` carries `title`, `description`, `status_code`, Open Graph fields (`og_title`, `og_image`), `canonical`, `author`, and `json_ld`. `result.links` is a list of `{text, url}` for every link on the page, which is a ready-made frontier for a crawler. - **Get the artifacts without the markdown.** `client.screenshot(url=..., full_page=True)` and `client.pdf(url=...)` are standalone calls that each return a single hosted URL. Use them when you want a capture and nothing else. `full_page=True` captures past the fold. - **Reach difficult sites.** Pass `use_proxy=True` to route the render through Steel's residential proxy network for pages that block datacenter traffic. ## How scrape differs from a browser session The other recipes in the cookbook connect a browser library (Playwright, Selenium) to a live Steel session over CDP, then drive clicks and reads themselves. That is the right tool when you need to log in, fill forms, or step through an app. Scrape is the right tool when you just want the page as it renders: one request in, content out, nothing to keep alive. If your agent's job is "read this URL," reach for scrape first and graduate to a session only when you need interaction. ## Related [TypeScript version](/cookbook/scrape) covers the same endpoint with the clean-markdown-for-LLM angle. [Rust version](/cookbook/scrape) walks the three calls separately. For a live, interactive browser instead, see [playwright-py](/cookbook/playwright). Steel's REST API turns a URL into structured content without a browser on your side. The `steel-rs` crate wraps three of those endpoints as plain async methods: `client.scrape()` returns parsed content plus typed metadata, `client.screenshot()` and `client.pdf()` render the page and hand back a hosted file URL. There is no session to create, connect to, or release. Each call is one stateless request that runs a browser on Steel's side and returns when the page is done. That makes this the shortest path into Steel from Rust, and it leans on the SDK's typed structs rather than raw JSON. `scrape()` deserializes into a `ScrapeResponse`, so the fields are real Rust types you can pattern-match on: ```rust let scraped = client .scrape(ClientScrapeParams { url: TARGET_URL.to_string(), format: Some(vec![ScrapeRequestFormatItem::Markdown]), // remaining options set to None; see main.rs }) .await?; let meta = &scraped.metadata; // ScrapeResponseMetadata meta.status_code; // i64 meta.title.as_deref(); // Option<&str> meta.language.as_deref(); // Option<&str> scraped.links.len(); // Vec scraped.content.markdown; // Option ``` `metadata` carries about twenty parsed fields (Open Graph tags, canonical URL, author, published time, the HTTP status code), so you get the document's shape without writing a single selector. `content` holds whichever formats you asked for in `format`: `Markdown`, `HTML`, `CleanedHTML`, or `Readability`. Request only what you need; markdown alone keeps the payload small for LLM context. `main` runs all three calls against Hacker News, prints the typed metadata, and writes `page.md`, `screenshot.png`, and `page.pdf` to the working directory. Screenshot and PDF responses are a hosted URL, not bytes, so the `download` helper fetches each URL with `reqwest` and writes the file. The artifacts live on Steel for a while after the call, which is handy if you would rather hand the URL to another service than store the bytes yourself. ## Run it ```bash cd examples/scrape-rs cp .env.example .env # set STEEL_API_KEY cargo run ``` Get a key at [app.steel.dev/settings/api-keys](https://app.steel.dev/settings/api-keys). The first build pulls `steel-rs`, `tokio`, and `reqwest`, so it takes a moment; later runs are fast. Your output varies. Structure looks like this: ```text Scraping https://news.ycombinator.com ... status 200 title Hacker News language en links 183 markdown 14217 chars wrote page.md Capturing screenshot ... wrote screenshot.png Rendering PDF ... wrote page.pdf Done. ``` Three calls cost a few cents of browser time total. Steel bills per session-minute, and these one-shot endpoints spin up and tear down their own browser, so there is nothing to leak: no cleanup call, no session left running against the default 5-minute timeout. The trade-off is that each call is independent, so you cannot log in once and scrape five pages behind the auth. For that, open a session and drive a real browser (see Related). ## Make it yours - **Change the target.** Edit the `TARGET_URL` constant. Every call reads from it. - **Pick formats.** Pass more variants in `format`, for example `vec![ScrapeRequestFormatItem::Markdown, ScrapeRequestFormatItem::HTML]`, then read `scraped.content.html`. Each requested format comes back as its own `Option` field on `content`. - **Get the screenshot and PDF in one call.** `scrape()` takes `pdf: Some(true)` and `screenshot: Some(true)`; the URLs come back on `scraped.pdf` and `scraped.screenshot` instead of making three round trips. - **Handle anti-bot pages.** Set `use_proxy: Some(true)` on any of the params to route through a Steel residential proxy. Add `delay: Some(2000)` to wait for late-loading content before capture. - **Match on the status.** `meta.status_code` is an `i64`, so branch on it before trusting the content (a soft 404 still returns markdown). ## Related [TypeScript version](/cookbook/scrape) and [Python version](/cookbook/scrape) cover the same three endpoints. For a full browser session you connect to and drive over CDP, see [chromiumoxide](/cookbook/chromiumoxide). For the HTTP surface these methods wrap, see the [reqwest docs](https://docs.rs/reqwest) and [Tokio docs](https://tokio.rs). Steel's direct API turns a URL into clean content with no browser library and no session to manage. One `client.Scrape` call runs a browser server-side and returns the page as Markdown (or HTML, readability, or cleaned HTML) inline, while `client.Screenshot` and `client.Pdf` render the same page to hosted files. This recipe scrapes a page to Markdown, prints a preview, then captures a full-page screenshot and a PDF. It is the lowest-friction way to reach a page from Go: no CDP, no chromedp, no `defer release`. The scrape call leads: ```go scraped, err := client.Scrape(ctx, steel.ClientScrapeParams{ URL: targetURL, Format: &[]steel.ScrapeRequestFormatItem{steel.ScrapeRequestFormatItemMarkdown}, }) markdown := deref(scraped.Content.Markdown, "") title := deref(scraped.Metadata.Title, "(no title)") ``` Two Go specifics show up here. Optional request fields are pointers (`Format` is a `*[]ScrapeRequestFormatItem`, `FullPage` is a `*bool`), and steel-go ships no pointer constructors, so the recipe defines a one-line `ptr[T]` generic. Response fields like `Content.Markdown` and `Metadata.Title` are `*string`, so a small `deref` helper supplies a fallback. The format is a typed constant (`steel.ScrapeRequestFormatItemMarkdown`), not a bare string. Screenshot and PDF come back as hosted URLs, not bytes: ```go shot, _ := client.Screenshot(ctx, steel.ClientScreenshotParams{URL: targetURL, FullPage: ptr(true)}) fmt.Println(shot.URL) // https://... pdf, _ := client.Pdf(ctx, steel.ClientPdfParams{URL: targetURL}) fmt.Println(pdf.URL) ``` To keep the files, fetch each URL with `net/http` and write the bytes to disk. ## Run it ```bash cd examples/scrape-go cp .env.example .env # set STEEL_API_KEY go run . ``` Get a Steel key at [app.steel.dev/settings/api-keys](https://app.steel.dev/settings/api-keys). Point it at any page with `TARGET_URL` in `.env`. Your output varies. Structure looks like this: ```text Steel Scrape API (Go) ============================================================ Scraping https://news.ycombinator.com to markdown... HTTP 200 | Hacker News Links found: 184 Markdown length: 8423 characters --- Markdown preview (first 500 chars) --- [ clean Markdown for the page ] --- end preview --- Capturing a full-page screenshot... Screenshot hosted at: https://... Rendering the page to PDF... PDF hosted at: https://... Done. Feed the markdown straight into an LLM prompt. ``` A scrape call costs a few cents of browser time. Steel starts and tears down the browser per call, so there is no session to release. ## Make it yours - **Change the page.** Set `TARGET_URL` in `.env`, or pass a different URL to `client.Scrape`. - **Ask for several formats.** `Format` takes a slice, so request more than one at once (`ScrapeRequestFormatItemMarkdown`, `...HTML`, `...Readability`, `...CleanedHTML`). Each lands under its own field on `Content`. - **Save the artifacts.** Fetch `shot.URL` and `pdf.URL` with `net/http` and `os.WriteFile` to write `screenshot.png` and `page.pdf`, the way the Python recipe does. - **Scrape behind a proxy.** Set `UseProxy: ptr(true)` to route through a Steel residential proxy for geofenced or bot-sensitive pages. ## Related [scrape-ts](/cookbook/scrape) and [scrape-py](/cookbook/scrape) are the same direct API in TypeScript and Python, where the Python recipe writes the screenshot and PDF to disk. [scrape-rs](/cookbook/scrape) is the Rust version. For a full browser you drive yourself, [chromedp](/cookbook/chromedp) and [Rod](/cookbook/rod) connect over CDP instead. ## Related recipes