Blog > Big Data > Post

Facebook Data Mining

Friday, August 02, 2019

Mining data from Facebook has been quite popular and useful in the past few years. The crawled or scraped data will be valuable and constructive for commercial, scientific, and many other fields of prediction and analysis, especially when these data are processed deeply, like data purge and machine learning. Without doubt, data mining which serves as a basis tier crossing the whole data process is of paramount importance.

Facebook also has provided a serving website allowing those developers to access its data since data enthusiasts express such intense interest in the data from Facebook, . This website has provided many simple and easy-to-grasp methods with detailed guidelines for users to learn and access to its resource.

Talking about this Facebook API which is known as Graph API, it is one kind of interface with REST (Representational State Transfer), which is based on the network architecture. It implies that Facebook calls functions by using remote methods, like HTTP, GET, POST to send messages and echo back REST service.

Take a Facebook example of Coca-Cola Corp., if users are intended to retrieve remarks posted on the graffiti wall, what they need to do is simply entering :

https://graph.facebook.com/cocacola/feed,then the system will return the data results in JSON file. JSON(JavaScript Object Notation) is one kind of data exchange format which is easy for users to handle, as well as easy for devices to analyze and generate. The data fields include the message ID, detailed info of data, author, author ID, and other kinds of info. Not only the graffiti wall, but also all other Facebook objects can use the following URL structure to retrieve what they want.


       "error": {

                     "message": "Unknown path components: /CONNECTION_TYPE",

                  "type": "OAuthException",

                     "code": 2500,

                      "fbtrace_id": "AU3Q0qQUX1/"  


Here, we should note that we can only access to the data  when the objects are public, otherwise we should provide access token if the objects are defined as private.  

Users should feel happy to hear this: there has been an R package which is known as the Rfacebook Package. It provides an interface to the Facebook API. For mining Facebook using R, the Rfacebook package  provides functions that allow R to access Facebook’s API to get information about posts, comments, likes, group that mention specific keywords & much more. Then we can use the specific commands like below to search pages. Apart from R, there exists a portion of people getting used to Python. Here are also tips for reference. First of all, check out the documentation on Facebook's Graph API https://developers.facebook.com/docs/reference/api/. If you are not familiar with JSON, DO read a tutorial on it (for instance http://secretgeek.net/json_3mins.asp). Once you grasp the concepts, start using this API. For Python, there are at several alternatives:                                                                                                                               

  • facebook/python SDK
  • pyFaceGraph 
  • It is also semi-trivial to write a simple HTTP client that uses the graph APIUsers are suggested to check out the Python libraries, try out the examples from their documentation and check if they have already done what you need to do. Compared with R, Python can simplify the data process procedure by saving the time of code management, output and note files. While using R can optimize the graph visualization since users can visualize friends on Facebook.                                                                                                        

There are still some data extraction tools for some people without any programming skills to scrape or crawl data from Facebook, like Octoparse, Visual Scraper.                                     



The web scraping tool is another great option to extract data on Facebook. Note that you only can extract public posts without login requirements. This is due to our web scraping ethics (reference https://www.octoparse.com/blog/is-web-crawling-legal-well-it-depends). Octoparse is a powerful web scraping tool that can scrape both static and dynamic websites with AJAX, JavaScript, cookies and etc. First, you need to download the client end and then start with your scraping tasks. For this software, you needn’t have any programming skills, but you should learn some rules that have been set to help users to extract data. Plus, it has provided cloud service and proxy server setting functionality to prevent IP block and accelerate the extracting process. 

Recently, Octoparse launched its new feature - web scraping templates. You would be able to use its Facebook scraping templates to extract the posts at ease. 



Would like to know more, please visit http://www.octoparse.com/


Visual Scraper:

Visual Scraper is another great free web scraper with a simple point-and-click interface and could be used to collect data from the web. You can get real-time data from several web pages and export the extracted data as CSV, XML, JSON or SQL files.                          

The freeware, which is available for Windows, enables you to scrape data from up to 50,000 web pages for only one user. Besides the SaaS, VisualScraper offers web scraping services such as data delivery services and creating software extractors services.




If you want to know more, please visit http://www.visualscraper.com/pricing


Author: The Octoparse Team

Edit: Ashley Weldon


Download Octoparse Today

 Author's Picks

Be the Best Junior Management Consultant: Skills You Need to Succeed

Web Scraping|Scrape Booking Reviews

Web Scraping|Scrape Data from Online Accommodation Booking Sites

5 Steps to Collect Big Data

A Must-Have Web Scraper for Data Comparison Software - Octoparse

10 Best Free Tools for Startups - Octoparse

30 Free Web Scraping Software



30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf






Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download