Extract Phone Numbers

Thursday, September 29, 2016 7:42 AM

(Download my extraction task of this tutorial HERE just in case you need it.)

 

 

In this tutorial, I will take who.org for example to show you how to extract web phone number.

Step 1. Choose “Advanced Mode”. ➜ Complete basic information. ➜ Click “Next”.

 

 

Step 2. Enter the target URL in the built-in browser. ➜ Click “Go” icon to open the webpage.

(URL of the example: http://www.indeed.com/jobs?q=customer+service&l=Houston%2C+TX )

 

Step 3. Since the contact information is in the text format rather than list format. RegEx Tool is the most effective method to extract the phone number.

 

Choose the section you want to extract. ➜ Click the highlighted link ➜ Click “Extract Inner HTML, including the page source code, text with format and images”.

Then you need to customize your data. ➜ Click “Customize Field” ➜ Click “Re-format extracted data” ➜ Click “Add step”➜ Click “Match with Regular Expression”.

 

Step 4. If you don’t know how to write a regular expression, you could try “Try RegEx Tool”.

 

Click “Try RegEx Tool”. You could find that the phone number begins with “<br/>Tel.:” and ends with “<br>” in the “Source Text”. ➜ Click “Start With” and paste “<br/>Tel.:”➜ Click “End With” and paste “<br>”➜ Click “Generate” ➜ Click “Match All” ➜ Click “Match” ➜ Click “Apply”.

The phone number would be shown in the “Output” browser. ➜ Click "OK" ➜ Click “Done”.

 

Step 5. You would find that the fax number is not extracted as it does not use the same regular expression as phone number. Using the regular expression again to extract the fax number.

Drag an "Extract Data" action into the Workflow Designer. ➜ Choose the section you want. ➜ Click the highlighted link ➜ Click “Extract Inner HTML, including the page source code, text with format and images”. Other steps are the same as Step 3 and Step4 (see above).

 

Step 6. Click the “Field Name” to modify the name ➜ Click “Next” ➜ Click “Next” ➜ Click “Local Extraction” ➜ “OK” to run the task on your computer. Octoparse will automatically extract all the data selected.

 

The data extracted will be shown in “Data Extracted” pane. Click “Export” button to export the results to Excel file, databases or other formats and save the file to your computer.

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

 

For more information about Octoparse, please click here.

Sign up today!

 

 

Author's Picks

 

Pagination: Scrape Data from Websites with Query Strings (1)

Octoparse Smart Mode -- Get Data in Seconds

Getting started with XPath 1

Getting started with XPath 2

Getting started with XPath 1

Collect Data from LinkedIn

Top 30 Free Web Scraping Software

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
btn_sidebar_use.png
btn_sidebar_form.png