Connect with Puppeteer
Drive a Steel session with Puppeteer via WebSocket connection
This guide shows you how to drive Steel's cloud browser sessions using Puppeteer.
Steel sessions are designed to be easily driven by Puppeteer. There are two main methods for connecting to & driving a Steel session with Puppeteer.
Quick Start: Want to jump right in? Skip to example project.
Method #1: One-line change (easiest)
Most Puppeteer scripts start with a puppeteer.launch() function to launch your browser with desired args that looks something like this:
1const browser = await puppeteer.launch({...});
Simply change this line to the following (replacing MY_STEEL_API_KEY with your api key):
1const browser = await puppeteer.connect({2browserWSEndpoint: 'wss://connect.steel.dev?apiKey=MY_STEEL_API_KEY',3});
and voila! This will automatically start and connect to a Steel session for you with all default parameters set. Your subsequent calls will work as they did previously.
When you're done, the session automatically releases when your script calls browser.close(), browser.disconnect(), or ends the connection.
Advanced: Custom Session IDs
This doesn’t support other UTM parameters to add args (that is what Method #2 is for) other than one - sessionId. This allows you to set a custom session id (UUIDv4 format) for the session.
This is helpful because you don’t get any data returned from connecting like this but by setting your own session ID, you can use the API/SDKs to retrieve data or taking actions on the session like manually releasing it.
Example:
1import { v4 as uuidv4 } from 'uuid';2import Steel from 'steel-sdk';34const sessionId = uuidv4(); // '9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'56const browser = await puppeteer.connect({7browserWSEndpoint: `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${sessionId}`,8});910// Get session details11const client = new Steel();12const session = await client.sessions.retrieve(sessionId);13console.log(`View session live at: ${session.sessionViewerUrl}`);
Method #2: Create and connect
Use this method when you need to drive a session with non-default features like proxy support or CAPTCHA solving. The main difference is that you'll:
-
Start a session via API
-
Connect to it via puppeteer.connect()
-
Release the session when finished
1import Steel from 'steel-sdk';2import puppeteer from 'puppeteer';3import dotenv from 'dotenv';45dotenv.config();67const client = new Steel({8steelAPIKey: process.env.STEEL_API_KEY, // Optional9});1011async function main() {12// Create a session with additional features13const session = await client.sessions.create({14useProxy: true,15solveCaptcha: true,16});1718// Connect with Puppeteer19const browser = await puppeteer.connect({20browserWSEndpoint: `wss://connect.steel.dev?apiKey=${process.env.STEEL_API_KEY}&sessionId=${session.id}`,21});2223// Run your automation24const page = await browser.newPage();25await page.goto('https://example.com');2627// Always clean up when done28await browser.close();29await client.sessions.release(session.id);30}3132main();
Important: With Method #2, sessions remain active until explicitly released or timed out. It’s best practise to call client.sessions.release() when finished instead of waiting for the session to timeout to be released.
Example Project: Scraping Hacker News
Here's a working example that scrapes Hacker News with proper error handling and session management:
Starter code that scrapes Hacker News for top 5 stories using Steel's Node SDK and Puppeteer.
Run by entering following commands in the terminal:
-
export STEEL_API_KEY=your_api_key -
npm start
The example includes:
-
Complete session configuration options
-
Error handling best practices
-
A working Hacker News scraper example
-
TypeScript support
You can also clone it on Github, Val.town, StackBlitz, or Replit to start editing it yourself!