When you’re picking a tool for web scraping, the decision that matters most isn’t language or browser engine — it’s whether the browser runs headless (no visible window, programmatic only) or headed (a real, visible browser window). Almost the entire ecosystem — Puppeteer, Playwright, Selenium, every cloud browser API — defaults to headless. A smaller category, the integrated scraping platforms, runs headed by design; Octoparse is the clearest current example. The two choices serve different operators, succeed against different sites, and break in different ways.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Why headless became the default
Headless browsers were built for developers. They run on servers without displays, fit cleanly into containers and CI/CD pipelines, skip the rendering overhead a human user needs (window chrome, GPU compositing, autofill, telemetry), and let one machine run many concurrent sessions. When the operator is writing code, reading logs, and describing the page programmatically, there’s no reason to see it. Everything in the browser runtime landscape above the integrated-platform tier — Puppeteer, Playwright, Selenium, Splash, every cloud browser API — assumes this posture. Headless is the silent default of code-based scraping.What headless costs you
The cost shows up in four places, and it shows up consistently:- Bot detection signals.
navigator.webdriverflips totrue, the user-agent saysHeadlessChrome, plugins are missing, canvas and WebGL produce anomalous fingerprints. Anti-bot services like Cloudflare and DataDome are tuned to spot these. The whole “stealth variant” sub-ecosystem — puppeteer-extra-stealth, undetected-chromedriver, nodriver, Patchright — exists because plain headless leaks them. - Viewport-driven behavior. Lazy-loaded images, intersection-observer content, visibility-gated scripts — these are designed around a page actually being rendered and “seen.” Headless can fake the viewport, but the boundary is fragile and the behaviors are easy to miss.
- No human in the loop. A CAPTCHA, an unexpected modal, a session-expired login — headless is blind. The script doesn’t know it’s stuck, only that it stopped returning data.
- Debugging by log, not by sight. When a selector breaks on page 42 of an overnight run, you reproduce it locally — often headed — to actually see what changed. The debugging tool is the headed posture; the production tool isn’t.
The headed-by-design category
There’s a smaller category of scraping tools where the page isn’t an internal implementation detail — it is the interface. The operator selects elements by clicking on a rendered page; watches a task execute step by step; sees a CAPTCHA when it appears and clears it; debugs by looking. Octoparse is the clearest current example. ParseHub follows the same pattern, and the older Web Scraper.io Chrome extension shares the lineage. The economics of headed-by-design only work if the runtime is purpose-built. A stock Chrome with all the human-user machinery (extensions, syncing, autofill, full GPU compositing, telemetry) is too heavy to run at scraping scale. So integrated platforms ship runtimes built specifically for the headed-scraping case — heavy enough to be authentic, light enough to run densely.Inside Octoparse’s two headed runtimes
Octoparse ships two purpose-matched headed runtimes, and switches between them based on what the target site demands.Electron Chromium, stripped and optimized
The first is a customized Chromium runtime built into Electron. Rather than running a stock browser, Octoparse has stripped and optimized this runtime specifically for scraping — removing unnecessary overhead like extensions, background processes, and rendering features that a human user needs but a scraper doesn’t. The result is a lightweight engine that loads pages faster, consumes significantly less memory and CPU, and can handle many concurrent sessions without bogging down a machine. Compared to running a full browser instance through Puppeteer or Selenium, this purpose-built approach offers a noticeable performance advantage, particularly when running tasks locally or on hardware with limited resources. The tight integration with Octoparse’s visual editor also means users configure and execute tasks in the same environment — no context switching between tools.Chrome for Testing driven by Puppeteer
The second is Chrome for Testing driven by Puppeteer. This is a full, unmodified Chrome browser controlled programmatically, behaving identically to what a real user would see. It’s the better option for sites with aggressive bot detection, fingerprinting, or compatibility checks that expect a standard Chrome environment. It’s heavier on resources than the Electron runtime, but the browser authenticity it provides is sometimes essential.When to use which
The key advantage of having both built in is flexibility without complexity. With standalone tools like Puppeteer or Playwright, users need to manage browser binaries, handle versioning, configure launch options, and deal with infrastructure concerns themselves. Octoparse abstracts all of that away. The optimized Electron runtime handles the vast majority of tasks efficiently, while Chrome for Testing serves as a ready fallback when full browser fidelity is needed — and switching between them is a configuration choice, not an engineering project. The team has also indicated that additional runtime options are on the roadmap, reflecting the reality that no single browser engine is ideal for every scraping scenario. Whether a task runs on your own machine or in the cloud, the runtime choice stays the same simple toggle.When headed wins, when headless wins
| Pick this | When |
|---|---|
| Headless code library (Puppeteer / Playwright / Selenium) | A developer owns and operates the scraper, you’re scaling on servers, and the target site isn’t heavily anti-bot |
| Headless cloud API (Browserless / Zyte / Browserbase) | Same as above, but you don’t want to host the browser yourself |
| Headless + stealth variant (puppeteer-extra-stealth, undetected-chromedriver, nodriver) | Same as above, but the target site fingerprints aggressively |
| Headed-by-design platform (Octoparse / ParseHub) | The operator isn’t an engineer; the target site has serious anti-bot defenses; you need to debug visually or intervene on CAPTCHAs and logins |