Scrape Emails from Facebook Pages
Wednesday, September 28, 2016 9:05 AMFor the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
Facebook is one of the largest data "treasure troves" on the internet. However, sometimes you may feel a little overwhelmed by the amount of information shown on the same page and have difficulty finding the exact information you want. Let's say we want to scrape Emails from Facebook account pages. How can we effectively scrape web emails using Octoparse? In this tutorial, we will teach you how to quickly extract email addresses by using the built-in RegEx tool.
Step 1. Create a task with a sample URL
Every workflow in Octoparse starts by telling Octoparse a web page to start with.
- Enter the target webpage URL into the search bar on the home screen, e.g. https://www.facebook.com/Octoparse/about/?entry_point=page_nav_about_item&tab=page_info)
- Click Start to create a task
Step 2. Extract email addresses
Once taken to the target webpage, you will notice the email address on the page. Of course, we can extract the text directly by clicking on them and selecting Extract text of the selected link. But what if there is too much information and you could not find the detail you want quickly? In that case, you could use the RegEx Tool alternatively. Follow the next few steps.
- Click the Go to Web Page action in the workflow
- Hover over the Data Preview section and click
to add a custom field
- Select Page-level data and then HTML source code
- Click on the three dots of the source code data field and select Clean data
- Click +Add step and select Match with Regular Expression
Tip! If you know how to write a regular expression, you can write a Regular Expression to match the email address directly. Check out this article to learn more. |
- If you are not sure how to write a regular expression, you could try the built-in RegEx tool
- The email address we need start with mailto: and end with " role
- Click Generate > Match > Apply to save the settings
You can copy the source code and paste it into a text editor. Search for “@” to locate the email address.
Step 3. Run the task to get the data
Run the task either on your local machine or in the cloud.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.
Top 30 Free Web Scraping Software
- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf