Scrape Emails from Facebook Pages

Wednesday, September 28, 2016 9:05 AM

(Download my extraction task of this tutorial of scraping emails from Facebook Pages HERE just in case you need it.)

Sometimes you may feel a little confused when scraping data like emails from Facebook as there is too much information shown in the same page and you could not find the exact information you want in a moment. In this tutorial, I will take Facebook for example to show you how to effectively scrape web emails by using Regular Expression.


Step 1. Set up basic information

Choose “Advanced Mode” ➜ Click “Start” ➜Complete basic information. ➜ Click “Next”.

Step 2. Go to the website.

Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage. Then the following interface will appear.

(URL of the example: https://www.facebook.com/Octoparse/about/?entry_point=page_nav_about_item&tab=page_info)

Step 3. Extract email addresses.

You could find the email address on the page. There are two methods to extract the email address. The first one is to extract the text directly.


But sometimes there is too much information and you could not find the detail you want quickly. In this case you could use our RegEx Tool.

Choose the website. ➜ Click the highlighted link ➜ Click “Extract Inner HTML, including the page source code, text with format and images”.

Then you need to customize your data. ➜ Click “Customize Field” ➜ Click “Re-format extracted data” ➜ Click “Add step” ➜ Click “Match with Regular Expression”.


If you know how to write a regular expression, you could create the Regular Expression to match the email address directly (click Here to know more about Regular Expression).

Paste the regular expression in the text box. ➜ Click “Match all” ➜ Click “Calculate”.

The email address would be shown in the “Output” browser. ➜ Click "OK".



If you don’t know how to write a regular expression, you could try “Try RegEx Tool”. Please see the GIF file below to follow the steps.

Click “Try RegEx Tool”.

You could find that the email address begins with “Email</div></div><div class="_2cpb"><div style="" class="_50f4">” and ends with “</div></div></div>” in the “Source Text”.

Click “Start With” and paste “Email</div></div><div class="_2cpb"><div style="" class="_50f4">”➜ Click “End With” and paste “</div></div></div>”➜ Click “Generate” ➜ Click “Match All” ➜ Click “Match” ➜ Click “Apply”.

The email address would be shown in the “Output” browser. ➜ Click "OK".

(Tip: You could copy the text and paste it to Word or other Notepade, and then find the headings with “@”. In this way you could find the beginning and the end session of the email address quickly.)


Step 4. Extract the customized results.

Click “Done” ➜ Click the “Field Name” to modify the name ➜ Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” ➜ “OK” to run the task on your computer. Octoparse will automatically extract all the data selected.


Step 5. Export data.

The data extracted will be shown in “Data Extracted” pane. Click “Export” button to export the results to Excel file, databases or other formats and save the file to your computer.


Author: The Octoparse Team




Download Octoparse Today



For more information about Octoparse, please click here.

Sign up today!



Author's Picks


The Primary Use of Regular Expression in Data Processing

Extracting Stock Prices using Regular expression (Example: Finance.Yahoo.com)

Octoparse Smart Mode -- Get Data in Seconds

Getting started with XPath 1

Getting started with XPath 2

Getting started with XPath 1

Collect Data from LinkedIn

Top 30 Free Web Scraping Software

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline