undefined

Web Scraping Troubleshooting | Missing items when creating a list

Wednesday, April 5, 2017 4:24 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

Why do some list items get left out?

Octoparse detects for items belonging to a list via their coding pattern in the underlying HTML source code.

When building a list, we usually start with selecting any 2 items from the list to define a coding pattern for Octoparse to refer to. In this case, if some list items are not included as we expect, then most probably they actually have a coding pattern different from the defined one. 

 

How to tell Octoparse I need those items as well?

To have the omitted items being included, we need to replace the old pattern with a new one. In Octoparse, this refers to modifying or rewriting the XPath expression auto-generated in its previous detection.

If you are new to XPath, you might need to grab some basics of HTML and XPath first. Here are some tutorials for your reference: HTML basic | XPath basic  

 

Where to input the new XPath expression?

Step 1. Select the Loop Item step from the workflow

Step 2. Check the Loop mode option

  • If Variable list mode is on, go to Step 3
  • If Fixed list mode is on, switch to Variable list mode

list_missing_data1

Step 3. Input the modified XPath expression into the textbox

 list_missing_data2

 

Artículo en español:  ¿Cómo lidiar con los elementos faltantes al crear una lista?

También puede leer artículos de web scraping en el sitio web oficial

 

Author: The Octoparse Team

Download Octoparse Today

 

 

For more information about Octoparse, please click here.

Sign up today.

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close