Skip to main content
Auto-detect scans a web page, identifies repeated data patterns, and generates a starting extraction workflow — including data fields and pagination logic. It sits between templates (fastest, but limited to supported sites) and the no-code builder (full manual control) in terms of effort and flexibility.

When to use Auto-detect

Use Auto-detect when:
  • The page has repeated items such as products, listings, reviews, or search results
  • You want a quick starting workflow
  • You are not sure which elements to select manually
  • You want Octoparse to suggest fields and pagination logic
  • You want to quickly preview what data can be extracted from a target website before committing to a full configuration
  • You plan to review and adjust the generated workflow afterward
Auto-detect is a starting point, not a guarantee that every field or action will be correct.

How Auto-detect works

Auto-detect analyzes the page DOM structure and visual layout, uses similarity calculation and feature combination to locate repeated data regions, and generates extraction rules automatically.
1

Open the target page

Start from the page that contains the data you want to extract.
2

Run Auto-detect

Octoparse scans the page, detects repeated data regions, and generates extraction fields. It also attempts to identify pagination or next-page buttons so the task can move through multiple result pages automatically.
3

Review detected fields

Check whether the suggested fields match the data you need.
4

Confirm page navigation

Review pagination, scrolling, or next-page actions. Auto-detect may identify a next-page button or infinite scroll pattern — verify that it works correctly before running at scale.
5

Test the task

Run a sample and verify the output before scaling the extraction.

What to review after detection

After Auto-detect generates a workflow, check:
AreaWhat to verifyHow to fix
FieldsAre the correct values captured?Re-select the element or adjust the XPath in the Data Preview panel
Field namesAre column names clear and meaningful?Rename fields before export
PaginationDoes the task move to the next page correctly?Manually set the next-page button or scroll action
Detail pagesDoes the workflow open item details when needed?Add a click action to enter detail pages
DuplicatesAre repeated or unwanted elements included?Delete unwanted fields or rows in the Data Preview panel
Missing valuesAre some rows missing important fields?Add fields manually in the Data Preview panel

Known limitations

Auto-detect works best on pages with clear, repeated data structures. It may produce incomplete or inaccurate results in certain situations:
SituationWhat may happen
Non-standard or irregular page layoutFields may be misidentified or missing
JavaScript-rendered dynamic contentData that loads after the initial page render may not be detected
Anti-scraping websites (e.g. Indeed, LinkedIn)Auto-detect may fail to load or parse the page
Complex nested structuresThe detected list may include incorrect or duplicate elements
Wrong field capturedThe system may pick up promotional text instead of the actual value (e.g. a discount label instead of the real price)
A common runtime error after Auto-detect is “[Loop Item] failed to find current loop item, exiting loop now”, which usually means the generated XPath no longer matches the actual page elements. Re-running Auto-detect or manually adjusting the selector in the no-code builder typically resolves this.

When manual editing is needed

Manual adjustments may be needed when:
  • The page layout is irregular
  • Important fields are outside the detected list
  • The website loads content dynamically
  • Pagination is not detected correctly
  • The page requires login, filters, popups, or user interaction
  • You need to extract data from detail pages
Use the no-code builder to refine the generated workflow.
Do not assume Auto-detect output is production-ready. Always review fields, run a sample, and check the exported data.

Templates

Start with a prebuilt workflow for common websites.

No-code builder

Adjust or build workflows manually after Auto-detect.

Refine data

Clean and reformat extracted fields before export.