Web Scraping Troubleshooting | Missing items when creating a listWednesday, April 5, 2017 4:24 AM
What is the problem?
While Octoparse does automatically detect lists, occasionally some items get left out from the list while you are adding items to a list. Most probably, these list items are cut off by a webpage element with different layout in the HTML source code.
How to solve it?
To solve this, we need to modify the XPath of the target web elements to accurately locate/include all of the items wanted to capture.
After a loop list had been created, notice only 5 items had been added to the list where there are a lot more items you still need.
Step 2. From Step 2, we can observe the articles are intercepted by a different web element.
Thus, we should modify the XPath of the article items to locate all the article elements by following the steps below.
Step 3. Sometimes the site will continue load more items when scroll down to the bottom before the "Load more" button appears, we can set the scroll time and intervals in order to the smooth of the extraction. In this scraping rule, I'd like to scroll down 5 times to display more items.
Now you've learned how to create a loop list when some items are missing. You can look into how this works with this example [link to the case example].
More tutorials or blogs are available if you'd like to learn more about related topics:
- How big data serves better for budget system
- How to Scrape Data from A Website with A “Load More” Button (Example: Kickstarter)
- Getting started with XPath
- Ajax timeout in Octoparse
Author: The Octoparse Team
For more information about Octoparse, please click here.