Web Scraping - How to Store Cookies in Octoparse

Sunday, March 05, 2017 11:11 PM

Most websites use cookies to make sure you get the best browsing experience and sometimes you need to store cookies in Octoparse when browsing these websites before extracting data. Besides, it's necessary to do so if you use our cloud service (Cloud Extraction) to scrape data from specific web pages for you.

 

This web scraping tutorial we will teach you how to store cookie when scraping a site. The examples of websites we'd like to use is TwitterYou can follow the steps below to make a scraping task(What is an OTD. file) to scrape information from Twitter. 

 

Step 1. Set up basic information.

 

Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click "Next".

 

Step 2. Enter www.twitter.com in the built-in browser to log into Twitter first. ➜ Click "Go" icon to open the webpage. 

 

 

Step 3. Enter login information

Click on the "Log in" button Choose "Click an item" and a "Click Item" action will be created in the workflow.

 

Step 3-1. Enter authorization information such as username and password.

 

Click the input field for "Phone, email or username" on the web page Choose "Enter text value" ➜ Enter your email, phone number or username in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save". You will see the email is shown on the web page. Enter the password in a similar way.

 

Note: If you want to uncheck the "Remember me" option, you can click the option, choose "Click an item" and uncheck it.

 

Step 4. Click the Login button.

 

Click the "Login" button ➜ Choose "Click an item" ➜ Click "Save".

 

Step 5. After you log into your Twitter account, you can go to the "Go To Web Page" action directly to load and store the login cookie by checking the option "Use specified Cookie" under "Cache Settings". Then you can choose to delete all the actions created except the "Go To Web Page" action, or just keep them unchanged. Don't forget to click on the Save button to save the configuration.

 

 

And we are done! Now you can begin to extract the information from the web page and optimize your scraping task to get the data smoothly. 

 

Author: The Octoparse Team 

 

Download Octoparse Today

For more information about Octoparse, please click here.

 

Author's Picks

Use Regular Expression in Octoparse

Modify XPath Manually in Octoparse

Scrape Web Data from A Drop-Down Menu1

Scrape Web Data from A Drop-Down Menu2

 

 

btn_sidebar_use.png
btn_sidebar_form.png