Step-by-step tutorials for you to get started with web scraping

Download Octoparse

How to exclude "Ads" items when creating a list?

Wednesday, November 24, 2021

The latest version for this tutorial is available here. Go to have a check now!


When you create a list of items to scrape a website, sometimes the list may include several “Ads” items (Example URL).


What should you do if you only want to scrape the non-ads items?

You just need to modify the XPath of the “Loop Item” to make it only locate the non-ads items.


If we check the source code of the items in the example above with firebug(an FireFox extension), you will see the difference between ads items and non-ads.



Apparently, the class attribute is different. So we can utilize this difference to write the XPath: //li[@class='regular-search-result']


Enter the XPath into Octoparse, you will see the Ads being excluded.



If you are new to XPath, you might need to grab some basics of HTML and XPath first. Here are some tutorials for your reference: HTML basic | XPath basic

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline