Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
How to exclude "Ads" items when creating a list?Wednesday, December 4, 2019
The latest version for this tutorial is available here. Go to have a check now!
When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).
What should you do if you only want to scrape the non-ads items?
You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.
If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.
Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']
Enter the XPath into Octoparse, you will see the Ads being excluded.
- Most popular tutorials
- Use lists to extract
- Set up proxies
- Scrape data via Google Searching
- Extract data from source code
- How to export extracted data to a database?