Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Every browser session leaks an identifying signature — a combination of user-agent, canvas drawing output, WebGL renderer string, font list, screen resolution, installed plugins, timezone, language headers, audio context outputs, and more. Modern anti-bot systems combine these signals into a browser fingerprint that identifies a session more reliably than an IP address ever could. A scraper that doesn’t manage its fingerprint hands the detection layer a clean ID badge on every request. The fingerprint layer sits underneath behavior: it’s what the session is, not what it does. A scraper has to pass both layers — clean fingerprint and human-like behavior — to stay invisible to a serious anti-bot system.

What signals get tracked

A modern fingerprint typically combines:
  • User-agent and accept headers. The obvious starting point. HeadlessChrome in the UA is a giveaway.
  • Canvas fingerprint. A site asks the browser to render a hidden canvas element and hashes the output; tiny rendering differences across GPUs, drivers, and OS combinations produce a stable per-machine signature.
  • WebGL fingerprint. Renderer string, vendor string, supported extensions — together a strong fingerprint for the GPU and driver stack.
  • Font list. Which fonts the browser can render, in what order — often distinctive enough to identify a session on its own.
  • Screen and viewport. Resolution, color depth, device pixel ratio. A 1366×768 desktop with a 200% scale factor is a different fingerprint from a 2560×1440 retina display.
  • Timezone and language. From Intl.DateTimeFormat and navigator.languages.
  • Audio context. Audio rendering produces device-specific fingerprintable output, the same way canvas does.
  • Plugins, navigator properties, hardware concurrency. Smaller but combinable signals.
  • navigator.webdriver. The dead giveaway for unmanaged automation.
The dozens of small signals combine multiplicatively. A site doesn’t need each one to be unique — it just needs the combination to be unique enough to track.

Why scrapers leak fingerprints

Three failure modes dominate:
  • Repeating the same default identity. A vanilla Puppeteer instance running 10,000 sessions presents the same canvas hash, same WebGL string, same font list — 10,000 “different users” with one machine’s fingerprint. Trivial to detect.
  • Generic values that no real user produces. A blank navigator.plugins, a canvas output that’s bit-exactly the standard Linux/Chrome rendering, a font list missing common system fonts — these are anomalies real Chrome users don’t generate.
  • Mismatch between signals. A fingerprint claiming en-US and America/New_York paired with a Russian IP. An iPhone user-agent with a desktop viewport. A Windows fingerprint with macOS-only fonts. Detection layers look for internal contradiction.

Managing the fingerprint

The remedies map directly to the failure modes:
  • Per-session uniqueness. Each task instance should present a distinct fingerprint — different canvas hash, different WebGL renderer, different font list — so the same fingerprint doesn’t repeat across “different users.”
  • Within-session consistency. Inside one session the fingerprint has to stay stable; switching mid-session is itself a tell.
  • Geographic coherence with the IP. A fingerprint’s timezone, language, and accept-language headers should match the proxy IP’s geography. Pair an Eastern European IP with an Eastern European fingerprint, not a default en-US.
  • Realistic, not random. Fingerprints assembled from purely random values are themselves anomalous. The right move is to draw from distributions of real-world fingerprints — common GPU strings, plausible font lists, normal screen sizes — not exotic values no human user produces.
Together these principles describe active fingerprint management, as opposed to relying on whatever defaults the runtime ships with.

How Octoparse approaches fingerprinting

On top of the headed-by-design runtime’s natural advantage of leaking fewer signals than headless tools, Octoparse actively manages the browser fingerprint. Sessions are presented with distinct, realistic fingerprint profiles rather than the same default identity repeated across every task — addressing the per-session-uniqueness requirement that a fixed fingerprint configuration fails. The runtime handles both the “passive” stealth that comes from being headed and the “active” stealth that comes from fingerprint diversity, without users having to wire up an external fingerprint service. Fingerprint management pairs with Octoparse’s behavioral simulation — the runtime looks like a different real user each session, and once on the page it acts like one.

When it matters

Active fingerprint management is overkill for static sites and basic bot detection. It earns its keep against heavier defenses:
  • Light defenses. Headed-by-default already covers it.
  • Medium defenses (rate limiting + basic detection). Defaults usually OK if the other layers are clean.
  • Heavy defenses (Cloudflare, DataDome, HUMAN, Akamai Bot Manager). Required. Without distinct, realistic per-session fingerprints, the same identity repeating across thousands of “different users” gets your scraper banned even when everything else looks clean.
For the behavioral companion to fingerprint stealth, see Human-like scraping. For the specific defense systems fingerprint management is meant to defeat, see Bypassing CAPTCHA and Cloudflare.