Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).
What should you do if you only want to scrape the non-ads items?
You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.
If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.
Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']
Enter the XPath into Octoparse, you will see the Ads being excluded.