Scenarios it handles
Extract repeated lists
Extract repeated items from search results, directories, product grids, or tables.
Drill into detail pages
Click into each item, collect additional fields, then return to the list.
Navigate across pages
Move across pages, load more results, or scroll to reveal dynamic content.
Customize field values
Select fields manually and refine extracted values before export.
- No suitable template exists
- Auto-detect misses fields or page actions
- The workflow requires detail pages
- The website uses custom pagination or infinite scroll
- You need to handle popups, menus, tabs, or login steps
- You need field-level cleanup before export
Basic workflow
Select elements
Click the data fields, buttons, links, or list items that define the extraction workflow.
Choose actions
Add actions such as extract data, click, loop through items, paginate, scroll, or wait.
Review the workflow
Check the action sequence and make sure the task follows the intended page path.
Common actions
| Action | Use it for |
|---|---|
| Go to Webpage | Load a target URL to start the extraction workflow To extract from multiple similar URLs, combine it with a Loop to iterate through each URL. |
| Extract data | Capture text, links, image URLs, attributes, or other field values |
| Click | Open links, buttons, tabs, menus, or detail pages |
| Loop | Repeat actions across a list of items |
| Enter Text | Type text into input fields on the page Also used for entering login credentials when the website requires manual authentication. |
| Pagination | Move through multiple pages of results |
| Scroll | Load content that appears after scrolling Supports full-page scrolling (Default) and scrolling within a specific area (Partial). You can set scroll repeats, wait time, and choose to end the loop automatically when no more content loads. |
| Wait action | Allow dynamic content to finish loading |
| Clean Data | Reformat or extract part of a field value |
Editing selectors manually
When the default element selection does not capture the right data, you can edit the underlying XPath directly. This is useful when:- The auto-generated selector is too broad or too narrow
- You need to target a specific attribute (e.g.
href,src,data-price) instead of visible text - The page structure changes slightly between items and the default selector misses some rows
Troubleshooting
| Problem | Likely cause | What to try |
|---|---|---|
| Cannot select an element | The element is inside an iframe or shadow DOM | Switch to the iframe first, or use the built-in browser to inspect the page structure |
| Selected element returns empty values | The content loads dynamically after the page renders | Add a Wait action before the extraction step |
| Loop extracts duplicate rows | The loop selector matches overlapping regions | Narrow the loop XPath to target a more specific container |
| Pagination stops early | The next-page button selector does not match on later pages | Check whether the button changes class or position on the last page |
| Data looks correct in preview but wrong after export | Field values contain hidden HTML or whitespace | Use Clean Data to trim or reformat before export |
Best practices
- Start with a small sample before running the full task.
- Use clear field names.
- Check whether values are captured from visible text, links, images, attributes, or HTML.
- Add waits when content loads dynamically.
- Keep the workflow as simple as possible.
- Re-test after the target website changes.
The no-code builder creates a repeatable workflow based on the website structure. If the website changes significantly, you may need to adjust the task.
Related pages
Templates
Start with a prebuilt workflow for common websites.
Auto-detect
Let Octoparse scan a page and generate a starting workflow automatically.
Refine data
Clean and reformat extracted fields before export.