Every browser session leaks an identifying signature — a combination of user-agent, canvas drawing output, WebGL renderer string, font list, screen resolution, installed plugins, timezone, language headers, audio context outputs, and more. Modern anti-bot systems combine these signals into a browser fingerprint that identifies a session more reliably than an IP address ever could. A scraper that doesn’t manage its fingerprint hands the detection layer a clean ID badge on every request. The fingerprint layer sits underneath behavior: it’s what the session is, not what it does. A scraper has to pass both layers — clean fingerprint and human-like behavior — to stay invisible to a serious anti-bot system.Documentation Index
Fetch the complete documentation index at: https://www.octoparse.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
What signals get tracked
A modern fingerprint typically combines:- User-agent and accept headers. The obvious starting point.
HeadlessChromein the UA is a giveaway. - Canvas fingerprint. A site asks the browser to render a hidden canvas element and hashes the output; tiny rendering differences across GPUs, drivers, and OS combinations produce a stable per-machine signature.
- WebGL fingerprint. Renderer string, vendor string, supported extensions — together a strong fingerprint for the GPU and driver stack.
- Font list. Which fonts the browser can render, in what order — often distinctive enough to identify a session on its own.
- Screen and viewport. Resolution, color depth, device pixel ratio. A
1366×768desktop with a 200% scale factor is a different fingerprint from a2560×1440retina display. - Timezone and language. From
Intl.DateTimeFormatandnavigator.languages. - Audio context. Audio rendering produces device-specific fingerprintable output, the same way canvas does.
- Plugins, navigator properties, hardware concurrency. Smaller but combinable signals.
navigator.webdriver. The dead giveaway for unmanaged automation.
Why scrapers leak fingerprints
Three failure modes dominate:- Repeating the same default identity. A vanilla Puppeteer instance running 10,000 sessions presents the same canvas hash, same WebGL string, same font list — 10,000 “different users” with one machine’s fingerprint. Trivial to detect.
- Generic values that no real user produces. A blank
navigator.plugins, a canvas output that’s bit-exactly the standard Linux/Chrome rendering, a font list missing common system fonts — these are anomalies real Chrome users don’t generate. - Mismatch between signals. A fingerprint claiming
en-USandAmerica/New_Yorkpaired with a Russian IP. An iPhone user-agent with a desktop viewport. A Windows fingerprint with macOS-only fonts. Detection layers look for internal contradiction.
Managing the fingerprint
The remedies map directly to the failure modes:- Per-session uniqueness. Each task instance should present a distinct fingerprint — different canvas hash, different WebGL renderer, different font list — so the same fingerprint doesn’t repeat across “different users.”
- Within-session consistency. Inside one session the fingerprint has to stay stable; switching mid-session is itself a tell.
- Geographic coherence with the IP. A fingerprint’s timezone, language, and accept-language headers should match the proxy IP’s geography. Pair an Eastern European IP with an Eastern European fingerprint, not a default
en-US. - Realistic, not random. Fingerprints assembled from purely random values are themselves anomalous. The right move is to draw from distributions of real-world fingerprints — common GPU strings, plausible font lists, normal screen sizes — not exotic values no human user produces.
How Octoparse approaches fingerprinting
On top of the headed-by-design runtime’s natural advantage of leaking fewer signals than headless tools, Octoparse actively manages the browser fingerprint. Sessions are presented with distinct, realistic fingerprint profiles rather than the same default identity repeated across every task — addressing the per-session-uniqueness requirement that a fixed fingerprint configuration fails. The runtime handles both the “passive” stealth that comes from being headed and the “active” stealth that comes from fingerprint diversity, without users having to wire up an external fingerprint service. Fingerprint management pairs with Octoparse’s behavioral simulation — the runtime looks like a different real user each session, and once on the page it acts like one.When it matters
Active fingerprint management is overkill for static sites and basic bot detection. It earns its keep against heavier defenses:- Light defenses. Headed-by-default already covers it.
- Medium defenses (rate limiting + basic detection). Defaults usually OK if the other layers are clean.
- Heavy defenses (Cloudflare, DataDome, HUMAN, Akamai Bot Manager). Required. Without distinct, realistic per-session fingerprints, the same identity repeating across thousands of “different users” gets your scraper banned even when everything else looks clean.