Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
How to exclude "Ads" items when creating a list?Friday, September 07, 2018
When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).
What should you do if you only want to scrape the non-ads items?
You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.
If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.
Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']
Enter the XPath into Octoparse, you will see the Ads being excluded.
- Most popular tutorials
- Extract multiple pages through pagination
- Scraping info from Craigslist
- Scraping search results from Google Scholar
- Scraping restaurant info from Grubhub
- Scrape product images from eBay