Web Scraping Troubleshooting | Missing items when creating a list

Wednesday, April 5, 2017 4:24 AM

What is the problem? 

While Octoparse does automatically detect lists, occasionally some items get left out from the list while you are adding items to a list. Most probably, these list items are cut off by a webpage element with different layout in the HTML source code.


How to solve it?

To solve this, we need to modify the XPath of the target web elements to accurately locate/include all of the items wanted to capture.                                                                                                                                                                         

Step 1. 

After a loop list had been created, notice only 5 items had been added to the list where there are a lot more items you still need.


Step 2. From Step 2, we can observe the articles are intercepted by a different web element.

Thus, we should modify the XPath of the article items to locate all the article elements by following the steps below.


Step 3. Sometimes the site will continue load more items when scroll down to the bottom before the "Load more" button appears, we can set the scroll time and intervals in order to the smooth of the extraction. In this scraping rule, I'd like to scroll down 5 times to display more items. 




Now you've learned how to create a loop list when some items are missing. You can look into how this works with this example [link to the case example].

More tutorials or blogs are available if you'd like to learn more about related topics:



Author: The Octoparse Team

Download Octoparse Today

For more information about Octoparse, please click here.

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline