RegEx: How to Extract All Phone Numbers from Strings

5/6/2016 4:56:15 AM

If you want to extract phone numbers by using Regular Expression but don’t know how to write Regular Extraction, the article may help you with this.

It could be multiple phone numbers in a single large string and these phone numbers could come in a variety of formats. Here is an example of file format:

  • (021)1234567
  • (123) 456 7899
  • (123).456.7899
  • (123)-456-7899
  • 123-456-7899
  • 123 456 7899
  • 1234567899
  • 0511-4405222
  • 021-87888822
  • +8613012345678
  • ...

What is the easiest way to extract phone numbers like these? Regular expression is very hard to learn if you don’t have any programming knowledge. In this article, I’ll introduce you a great Regular Expression tool to help you directly generate Regular Expressions and match all the phone numbers quickly.

Regular Expression to Match Email Addresses from strings

First, try you best to find the common character that each phone number starts with and ends with. For example, for the targeted text above, I find its source code, shown as below.

 

<p >Here is an example of file format </p>

<ul>

  <li>(021)1234567 </li>

  <li>(123) 456 7899 </li>

  <li>(123).456.7899 </li>

  <li>(123)-456-7899 </li>

  <li>123-456-7899 </li>

  <li>123 456 7899 </li>

  <li>1234567899 </li>

  <li>0511-4405222 </li>

  <li>021-87888822 </li>

  <li>+8613012345678 </li>

  <li>... </li>

</ul>

 

We can see that each phone number starts with <li> and ends with </li>. And we can use RegEx Tool in Octoparse to quickly extract all phone numbers. 

       1. Run Octoparse and open RegEx Tool.

       2. Copy and paste the source code in the “Source Text” box.

           Then select “Start With” option and enter “<li>”.  

       3. Next, select “End With” option and enter “</li>”.

           Don’t forget to select “Match All” option.

       4. Select “Generate”and “Match”option one by one.

It’s done. All the matched phone numbers are listed in the green box.

 

Note that if you can’t find out the common character that each phone number starts with and ends with, you cannot extract all phone numbers at a time. If so, you need a special Regular Expression for each format of phone numbers.

Here, I wrote down two additional Regular Expressions for two formats of phone numbers.

 

Regular Expression:

\d{3}-\d{8}|\d{4}-\d{7}

Match: 0511-4405222 | 021-87888822

 

Regular Expression:

\(\d{2,4}\)\d{6,7}

Match: (021)1234567 | (0411)123456 | (000)000000 |(123)1234567

 

 

 

 

 

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

 

For more information about Octoparse, please click here.

Sign up today.

 

 

Author's Picks

 

About Octoparse

A Comparison among Three Editions of Octoparse

Octoparse 6.0 is Now Available

What A Price Monitor Can Help you?

Collect Data from Amazon

Collect Data from eBay

Collect Data from LinkedIn

Collect Data from Gumtree.com

 

 

 

Recent Posts

Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.