undefined

How to Extract Data from Facebook

Thursday, March 31, 2016 10:23 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

 

Facebook is a platform with a huge amount of user-generated content. There are a whole lot of things you can do with data from Facebook.
It can be used to better understand your audience for business and political gains. You can also collect posts of users or in groups and comments to carry out a sentimental analysis.

With Octoparse, you can easily get post info from Facebook by using Octoparse templates. There is no need to configure scraping tasks.
Just input the keywords/URLs and wait for the data to be scraped. For further details, you may check it out here: Task Templates

 

You may want to use this URL as an example:
https://www.facebook.com/cnn/

Here are the 5 main steps in this tutorial [Download task file here]

1. Go to Web Page - to open the target website

2. Log into Facebook

3. Auto-detect web page - to create a workflow

4. Modify the XPath of the "Loop Item"

5. Run your task - to get the data you want

 

1. Go to Web Page - to open the target website

 

  • Enter the URL on the home page and click "Start"

 

Octoparse will automatically load the page in the built-in browser and you will find a login page.

 

2. Log into Facebook

 

  • Toggle on Octoparse's Browse mode
  • Fill out the log-in page with your user name and password and click "Log In"
  • Toggle off the Browse mode

 

Tip: If you would like to log in to see more information or discover that the login steps should be included in the workflow

to help run the task successfully, please follow this tutorial to see how to log in to a website in Octoparse:

Scrape data behind a login

 

3. Auto-detect the web page - create a workflow

 

    • Click "Edit" under "Add a page scroll"
    • Set to scroll to the bottom, repeat 20 times, wait time as 5s
    • Rename or delete fields in the Data preview if needed
    • Click on "Create workflow"

 

4. Modify the XPath of the "Loop Item"

 

  • Click on the "Loop Item" action 
  • Make sure the "Variable List" is in  loop mode
  • Enter the Xpath //div[@role="article"][not(@aria-label="Comment")]/../.. 
  • Click "Apply" to save the settings.

 

Tip: XPath plays an important role in locating the correct elements in Octoparse. You can check the tutorial below to learn more about it:

What is XPath and how to use it in Octoparse

 

5. Run your task - get the data you want

 

  • Click "Save" to save the task first
  • Then, click "Run" on the upper left side
  • Select "Run task on your device" to run the task on your computer, or select “Run task in the Cloud” to run the task in the Cloud (for premium users only)

 

 

Here is the sample output.

 facebook_data_output

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today.

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline