Web Scraping Troubleshooting | Missing items when creating a list
Wednesday, April 5, 2017 4:24 AMFor the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
Why do some list items get left out?
Octoparse detects for items belonging to a list via their coding pattern in the underlying HTML source code.
When building a list, we usually start with selecting any 2 items from the list to define a coding pattern for Octoparse to refer to. In this case, if some list items are not included as we expect, then most probably they actually have a coding pattern different from the defined one.
How to tell Octoparse I need those items as well?
To have the omitted items being included, we need to replace the old pattern with a new one. In Octoparse, this refers to modifying or rewriting the XPath expression auto-generated in its previous detection.
If you are new to XPath, you might need to grab some basics of HTML and XPath first. Here are some tutorials for your reference: HTML basic | XPath basic
Where to input the new XPath expression?
Step 1. Select the Loop Item step from the workflow
Step 2. Check the Loop mode option
- If Variable list mode is on, go to Step 3
- If Fixed list mode is on, switch to Variable list mode
Step 3. Input the modified XPath expression into the textbox
Artículo en español: ¿Cómo lidiar con los elementos faltantes al crear una lista?
También puede leer artículos de web scraping en el sitio web oficial
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.