Step-by-step tutorials for you to get started with web scraping

Download Octoparse

How to exclude "Ads" items when creating a list?

Friday, September 07, 2018

When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).

 

What should you do if you only want to scrape the non-ads items?

You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.

If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.

 

Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']

Enter the XPath into Octoparse, you will see the Ads being excluded.

Tips!

If you are new to XPath, you might need to grab some basics of HTML and XPath first. Here are some tutorials for your reference: HTML basic | XPath basic

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png