Web Scraping - Scraping Facebook That Required Login with Octoparse

Thursday, January 12, 2017 10:29 PM

 

It's common that you need to log in to a website that requires a username and password before scraping data from this website or performing more actions. Websites such as Facebook, Twitter, LinkedIn, and etc. would require users to first log in their accounts to visit the website or view more contents.

 

In this web scraping tutorial we will teach you how to scrape a site that required login with Octoparse. The examples of websites which required login we'd like to use are Facebook, Twitter and LinkedIn

 

 

Before scraping a website that required login with Octoparse, we need to create a task for each website and set up basic information.

Click "Quick Start" ➜ Choose "New Task (Advanced Mode)" ➜Complete basic information ➜ Click "Next".

 

 

Facebook.  (Download the scraping task HERE)

 

Step 1. Enter the  URL of Facebook website in the built-in browser. ➜ Click "Go" icon to open the webpage ➜ Click "Save".

(URL of the example: https://www.facebook.com/?stype=lo&jlou=AfdpcqxUre_1gbgZ5SOb0-KvZp9Ex5BwenJg2fO4Dz2MHw0jKnROZkAAbC_TaFcGAe6kiA2X2fcQuFmf5dSeBgviyGdb47hV

Ym0a_0SfogqCQw&smuh=54539&lh=Ac_oyCjRcPZfNeXe)

 

Step 2. Enter authorization information such as username and password.

Click the input field for "Email or Phone" on the web page Choose "Enter text value" ➜ Enter your email or phone number in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save".

 

Click the input field for "Password" on the web page Choose "Enter text value" ➜ Enter your password in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save". You will see the password is shown on the web page.

 

Step 3. Click the Login button.

Click the "Login" button ➜ Choose "Click an item" ➜ Click "Save".

 

Step 4. Enter the URL of the Facebook website you want to scrape data from.

Drag a “Go To Web Page” to the workflow and enter the target URL in the textbox of "Page URL". Then click "Save".

 

Twitter.  (Download the scraping task HERE)

Step 1. Enter the  URL of Twitter website in the built-in browser. ➜ Click "Go" icon to open the webpage ➜ Click "Save".

(URL of the example: https://twitter.com/)

 

Step 2. Click on the "Log in" button Choose "Click an item" and a "Click Item" action will be created in the workflow.

Because the web page uses AJAX to click the "Log in" button so we need to set AJAX timeout for the "Click Item" action.

Tick "AJAX Load" checkbox under "Advanced Options" ➜ set an AJAX timeout of 2 seconds ➜ Click "Save". 

 

Step 3. Enter authorization information such as username and password.

Click the input field for "Phone, email or username" on the web page Choose "Enter text value" ➜ Enter your email, phone number or username in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save".

 

Click the input field for "Password" on the web page Choose "Enter text value" ➜ Enter your password in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save". You will see the password is shown on the web page.

 Note: If you want to uncheck the "Remember me" option, you can click the option, choose "Click an item" and uncheck it.

 

Step 4. Click the Login button.

Click the "Login" button ➜ Choose "Click an item" ➜ Click "Save".

 

Step 5. Enter the URL of the Twitter website you want to scrape data from.

Drag a “Go To Web Page” to the workflow and enter the target URL in the textbox of "Page URL". Then click "Save".

 

LinkedIn.  (Download the scraping task HERE)

Step 1. Enter the  URL of LinkedIn website in the built-in browser. ➜ Click "Go" icon to open the webpage ➜ Click "Save".

(URL of the example: https://www.linkedin.com/uas/login?goback=&trk=hb_signin)

 

Step 2. Enter authorization information such as username and password.

Click the input field for "Email address" on the web page Choose "Enter text value" ➜ Enter your email address in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save".

 

Click the input field for "Password" on the web page Choose "Enter text value" ➜ Enter your password in the textbox for "Enter text" under "Customize Current Action" ➜ Click "Save". You will see the password is shown on the web page.

 

Step 3. Click the Login button.

Click the "Login" button ➜ Choose "Click an item" ➜ Click "Save".

 

Step 4. Enter the URL of the Facebook website you want to scrape data from.

Drag a “Go To Web Page” to the workflow and enter the target URL in the textbox of "Page URL". Then click "Save".

 

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today!

 

 

Author's Picks

 

 

Scheduled Data Extraction - Octoparse Cloud Web Scraping Service 

Scrape Data from Yellowpages.com

Scraping Online Dictionary-Merriam-Webster

Scraping Product Detail Pages from eBay.com

Scraping Hotel Reviews from Tripadvisor.com

Scraping Stock Informataion from CNN Money

How to Scrape WordPress Posts

Scrape Article Information from Google Scholar

Where will I set AJAX timeout

  

 

 

 

btn_sidebar_use.png
btn_sidebar_form.png