How to Avoid the CookieWall When Scraping the Website in Octoparse

Thursday, April 21, 2016 6:16 AM

Cookies are usually small text files, given ID tags that are stored on your computer's browser directory or program data subfolders. Cookies are created when you use your browser to visit a website that uses cookies to keep track of your movements within the site, help you resume where you left off, remember your registered login, theme selection, preferences, and other customization functions. That is, cookies can help a website to arrange content to match your preferred interests more quickly. Most major websites use cookies. Sometimes cookie messages appear on the screen to inform users that the cookie would be created and remained in the cookie file of your browser when you access certain websites.


If you want to scrape some web pages from a website and the cookie will message would always come first when the web page is loaded in Octoparse, you can configure a rule to remove the cookie wall. Here we would take http://www.marktplaats.nl for instance and solve the problem by the following steps.



1. Login to Octoparse and create a task.

2. Set basic info and click ‘Next’.

3. Open the website http://www.marktplaats.nl. Save the URL and load it. 

4. After the web page is loaded, the URL would change as http://www.marktplaats.nl/cookiewall/and a cookie message window has appeared on the screen. 

5. Click the ‘Cookies acceptance’ and then choose ‘Click an item’. 

6. In the workflow designer, the Click Item action has been created. The web page would be loaded again without cookie wall. The URL of the website has changed to a normal one without cookie wall. 

7. Add a ‘Go to the Webpage’action to the workflow designer and open any subpage of the website. Take the URL below for example: 


8. After the second web page is loaded, you would find that the cookie wall was removed and then you can extract any data on the website.

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline