How to Avoid the CookieWall When Scraping the Website in OctoparseThursday, April 21, 2016 6:16 AM
If you want to scrape some web pages from a website and the cookie will message would always come first when the web page is loaded in Octoparse, you can configure a rule to remove the cookie wall. Here we would take http://www.marktplaats.nl for instance and solve the problem by the following steps.
1. Login to Octoparse and create a task.
2. Set basic info and click ‘Next’.
3. Open the website http://www.marktplaats.nl. Save the URL and load it.
4. After the web page is loaded, the URL would change as http://www.marktplaats.nl/cookiewall/and a cookie message window has appeared on the screen.
5. Click the ‘Cookies acceptance’ and then choose ‘Click an item’.
6. In the workflow designer, the Click Item action has been created. The web page would be loaded again without cookie wall. The URL of the website has changed to a normal one without cookie wall.
7. Add a ‘Go to the Webpage’action to the workflow designer and open any subpage of the website. Take the URL below for example:
8. After the second web page is loaded, you would find that the cookie wall was removed and then you can extract any data on the website.
If this video tutorial is not available for you, you can click hereto see the corresponding graphic tutorial.