How to Avoid the CookieWall When Scraping the Website in Octoparse

Thursday, April 21, 2016 6:16 AM

Cookies are usually small text files, given ID tags that are stored on your computer's browser directory or program data subfolders. Cookies are created when you use your browser to visit a website that uses cookies to keep track of your movements within the site, help you resume where you left off, remember your registered login, theme selection, preferences, and other customization functions. That is, cookies can help a website to arrange content to match your preferred interests more quickly. Most major websites use cookies. For more info please visit 


Sometime cookie messages are appear on the screen to inform users that the cookie would be created and remained in the cookie file of your browser when you access certain websites.


If you want to scrape some web pages from a website and the cookiewall message would always come first when the web page is loaded in Octoparse, you can configure a rule to remove the cookiewall. Here we would take for instance and solve the problem by the following steps.



  1. Login to Octoparse and create a task.
  2. Set basic info and click ‘Next’.
  3. Open the website Save the URL and load it. 
  4. After the web page is loaded, the URL would change as a cookie message window has appeared on the screen. 
  5. Click the ‘Cookies accepteren’and then choose ‘Click an item’. 
  6. In the workflow designer, the Click Item action has been created. The web page would loaded again without cookiewall. The URL of the website has changed to a normal one without cookieswall. 
  1. Add a ‘Go to the Webpage’action to the workflow designer and open any subpage of the website. Take the URL below for example:


    8. After the second web page is loaded, you would find that the cookiewall has removed and then you can extract any data in the website.