Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Ad blocking and clear cache

Thursday, August 16, 2018

Generally, a task created in Octoparse begins with opening the target web page. To facilitate this step, we provide two features to help: ad blocking and clear cache. Utilizing these features properly can greatly accelerate your web scraping process.

Features covered in this tutorial are:

Ad Blocking

Clear Cache

 

 

 

Ad Blocking

The extraction speed of a crawler is affected by the speed of page loading. If many unexpected ads appear on the web page, such as banners and pop-ups, the page will load slowly and waste your time. Ad blocking can reduce your page requests and thus optimize the loading time.

 

How to block Ads

There are two ways in Octoparse to set up "Ad Blocking".

1. Select the step of "Go To Web Page", you can easily locate "Ad Blocking" in "Advanced Options".

 

 

 

2. Or click "Settings", then you can see the "Block ads" option.

 

 

 

Tips!

Using the Ad blocking technique may change the structure of some web pages. If so, please adjust the XPath to re-locate the elements.

Learn more about locating elements with XPath .

 

 

 

Clear Cache

In some cases, for example, if you need to clear cookies remembered for extracting data behind a login,  Octoparse also offers the clear cache option for you to reload the page.

 

 

How to clear cache

1. Select the step of "Go To Web Page", "Clear Cache" could be easily found in "Cache Settings".

 

2. After the page opened, if you want Octoparse to remember the new cookie, it’s also easy.

  • Click "Use specified Cookie"
  • Click "Load cookie from current web page"

 

Now Octoparse has "remembered" the new cookie. 

 

Tips!

1. As cookies come in different forms, their valid period is also different. Some stay longer, while others expire as soon as the browser is closed. In Octoparse, the saved cookies will no longer work if it expires. Then you need to "Clear Cache" and reload the cookie.

2. Cache Settings is quite important especially for websites requiring the login, learn more about extracting data behind a login .

 

Related articles:

Locate elements with XPath 

Extract data behind a login 

Interact with webpage 

More techniques 

Case tutorial | scrape pricing from eBay 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png