The browser runtime landscape

Most articles on browser automation talk about one tool at a time. The more useful question is what the space looks like — what kinds of runtimes exist, what each kind is good at, and which axis actually matters when you’re picking one for scraping. The most decisive axis isn’t language or browser engine; it’s whether the runtime runs headless by default or headed by design. Almost the entire ecosystem sits on the headless side. The integrated scraping platforms — with Octoparse the clearest example — sit on the headed side. Knowing which side a runtime is on tells you more about how it will behave on real sites than any spec sheet.

At a glance

Tool	Interface	Browser engine(s)	Default mode	Type	Strongest at
Puppeteer	Node API	Chromium	Headless	OSS library	Chrome scraping in JS
Playwright	Node, Python, Java, .NET	Chromium, Firefox, WebKit	Headless	OSS library	Cross-browser, modern code
Selenium	Most languages	Most browsers (WebDriver)	Configurable	OSS library	Widest browser & legacy support
Splash	HTTP API (Lua)	WebKit	Headless	OSS service	JS rendering inside Scrapy
WebdriverIO	Node	WebDriver / CDP	Configurable	OSS library	Test-style scraping in Node
chromedp / Rod	Go	Chromium	Headless	OSS library	Go-native scrapers
Pyppeteer	Python	Chromium	Headless	OSS library	Puppeteer-shaped API in Python
HtmlUnit	Java	Pure-Java browser	Headless	OSS library	JVM scraping without a browser binary
puppeteer-extra-stealth	Node plugin	Chromium	Headless	OSS plugin	Puppeteer + bot evasion
undetected-chromedriver	Python	Chromium via Selenium	Configurable	OSS library	Selenium + bot evasion
nodriver	Python	Chromium via CDP	Headless	OSS library	Modern stealth, no driver binary
Patchright	Node	Chromium	Headless	OSS fork	Playwright + stealth patches
Browserless	HTTP API	Chromium	Headless	Cloud / self-host	Hosted browsers behind an API
Browserbase	HTTP API	Chromium	Headless	Cloud service	Managed browsers for AI agents
Steel.dev	HTTP API	Chromium	Headless	Cloud / OSS	OSS-friendly cloud browsers
Bright Data Scraping Browser	HTTP API	Chromium	Headless	Cloud service	Browser + built-in unblocking
Zyte API	HTTP API	Chromium	Headless	Cloud service	Browser + anti-bot handling
ScrapingBee	HTTP API	Chromium	Headless	Cloud service	Simple “render this URL” API
ScrapingAnt	HTTP API	Chromium	Headless	Cloud service	Budget scraping with proxies
Apify Browser Actors	Apify platform	Chromium, Firefox	Configurable	Cloud platform	Apify-native large-scale scraping
Octoparse	Visual workflow + cloud	Electron Chromium, Chrome for Testing	Headed	Integrated platform	No-code, WYSIWYG selection, headed by design
ParseHub	Visual workflow + cloud	Chromium	Headed	Integrated platform	No-code, similar concept

The bold row is Octoparse — headed by design, with a purpose-built runtime for scraping. ParseHub shares the headed category, but Octoparse is the more widely adopted example.

Open-source automation libraries

This is where most code-based scraping starts. Puppeteer drives Chromium from Node — fast, modern, Chrome-only. Playwright, often described as Puppeteer’s successor, covers Chromium, Firefox, and WebKit across Node, Python, Java, and .NET; for a new project today, it’s usually the better default unless you specifically need Chrome-only. Selenium is the elder statesman — slower and heavier, but it speaks to nearly every browser through the WebDriver protocol, which still matters when a project needs Safari, Edge legacy, or mobile-browser bindings. Outside the big three, the field branches by language and ecosystem. Splash is the JS-rendering service that fits inside Scrapy pipelines, scripted in Lua. WebdriverIO brings a WebDriver/CDP-driven API to Node-heavy projects with a test-runner feel. In Go, chromedp and Rod are the two practical choices, with Rod often preferred for ergonomics. Pyppeteer is the Python port of Puppeteer for teams that want Puppeteer’s shape without leaving Python. HtmlUnit is the outlier — a pure-Java browser implementation, no Chromium binary involved, useful when the JVM ecosystem matters more than JS-engine fidelity. All of these run headless by default. They can run headed, but the friction is real — you need a display (or a virtual one like Xvfb), and most scripts in the wild don’t bother. Their normal posture is invisible.

Stealth and anti-detection variants

When a target site fingerprints the runtime, plain Puppeteer or Selenium gets caught quickly — navigator.webdriver, missing plugins, the headless Chrome user-agent, canvas / WebGL anomalies. The stealth variants patch those leaks. puppeteer-extra-stealth is the most established: a Puppeteer plugin that ships a stack of evasions for the common headless fingerprints. undetected-chromedriver does the same for Selenium-driven Chrome and is the go-to in the Python anti-bot space. nodriver is a newer, driver-less CDP approach from the same author, designed to look like an organic browser session from the network up. Patchright is a Playwright fork with similar stealth patches baked in, for teams already on Playwright. These don’t change the headless/headed posture — they’re still headless by default. They reduce the gap between headless and “real,” but they’re playing defense against a continuously updated detection layer.

Cloud browser APIs

Instead of self-hosting browsers, you call an HTTP endpoint and get a rendered page or a controllable session back. Browserless is the most established — works as managed cloud or self-hosted, drop-in Puppeteer/Playwright endpoint. Browserbase and Steel.dev are newer entrants oriented toward AI agents (Steel is OSS-friendly). Bright Data Scraping Browser and Zyte API bundle browser execution with anti-bot handling and unblocking infrastructure — you pay more, and you get a higher success rate against hard targets. ScrapingBee and ScrapingAnt are simpler “render this URL” APIs aimed at smaller teams. Apify is a platform of its own, with Browser Actors that combine cloud-hosted Chromium with Apify’s queueing and storage. All of these run the browser somewhere on a server with no display attached. Headless is the only mode that makes economic sense in this category — you can’t see what they’re doing, only what they return.

Integrated scraping platforms

This category is structurally different from everything above. Instead of a library you call from code, or an API you POST URLs to, an integrated platform gives you a visual workflow editor with a browser embedded inside it. You build the scraper by clicking on the page, not by writing selectors. Octoparse is the clearest example. It runs two runtimes — a stripped, optimized Electron Chromium for everyday tasks, and Chrome for Testing driven by Puppeteer for sites that need a fully authentic browser. Crucially, both are headed by design: the browser window is visible because the visible page is the editor. ParseHub sits in the same category with a similar approach. Older entries like the Web Scraper.io Chrome extension share the headed lineage too — a browser extension can only operate inside a headed Chrome window. This is the only category where headed is the default rather than a configuration option. That isn’t a limitation — it’s the design choice the workflow depends on.

Headless vs headed: the axis that matters

The headless/headed split tracks who the runtime is for. Headless makes sense when a developer is the operator. You’re writing code, reading logs, scaling out on servers without displays; you don’t need to see the page because you’re describing it programmatically. The whole ecosystem above the integrated-platform line is built on this assumption. Headed makes sense when the page itself is the interface. You’re selecting elements visually, watching a task run, intervening on a login or CAPTCHA, debugging by seeing rather than logging. That’s the Octoparse posture — and it’s also why Octoparse’s stripped Electron runtime exists: headed isn’t necessarily heavy if the underlying browser is purpose-built for scraping rather than general browsing. Two practical consequences fall out of this:

Bot detection. Real headed browsers — visible window, real rendering, real input events — leak fewer of the signals anti-bot services hunt for. Headless tools have to add stealth layers; headed-by-design platforms get this largely for free.
Operator skill. Headless tools assume engineering ownership: someone maintains the script, the proxies, the captcha solver, the deploy. Headed-by-design platforms assume the operator is closer to the data — analyst, ops, growth — and the platform owns the engineering.

For a deeper look at why headed-by-design is a deliberate choice rather than a missing feature, see Headed vs headless browsers.

How to pick

A few decision rules that hold up across most projects:

Writing your own scraper in code, just need to render a page? Playwright is the default. Puppeteer if Chrome-only and you’re already in Node. Selenium only if you need a browser those two don’t support.
Code-based scraper, target site fingerprints aggressively? Move to a stealth variant (puppeteer-extra-stealth, undetected-chromedriver, nodriver) — or skip to a cloud API that bundles anti-bot handling.
Don’t want to host browsers at all? Cloud APIs: Browserless / Browserbase / Steel for plain rendering; Zyte API / Bright Data Scraping Browser for unblocking-included.
Don’t want to write code at all? Integrated platform: Octoparse (or ParseHub). Headed by design, visual selection, runtime bundled with workflow and cloud extraction.
Operator isn’t an engineer, and the target site has anti-bot defenses? This is the strongest case for headed by design — fewer detection signals to leak, and a human can step in when a CAPTCHA appears.

The runtime decision is rarely permanent. Many teams start in a cloud API for one-off rendering, move to a stealth-equipped code library for repeat jobs, and reach for an integrated platform when the operator needs to be someone other than the engineer.

GET STARTED

WEB SCRAPING BASICS

HOW WEB SCRAPERS WORK

USE CASES

GUIDES

The browser runtime landscape

At a glance

Open-source automation libraries

Stealth and anti-detection variants

Cloud browser APIs

Integrated scraping platforms

Headless vs headed: the axis that matters

How to pick

​At a glance

​Open-source automation libraries

​Stealth and anti-detection variants

​Cloud browser APIs

​Integrated scraping platforms

​Headless vs headed: the axis that matters

​How to pick

At a glance

Open-source automation libraries

Stealth and anti-detection variants

Cloud browser APIs

Integrated scraping platforms

Headless vs headed: the axis that matters

How to pick