undefined
Blog > Post

RegEx: How to Extract All Phone Numbers from Strings

Tuesday, August 10, 2021

Sometimes a Regex tool can help you out from perplexing learning materials and make Regex writing super easy. This is a fast guide for beginners to extract phone numbers from strings. 

 

Table of Contents

What's RegEx

How to write a regular expression

Examples of phone extraction using Regex

 

What's RegEx?

RedEx stands for Regular Expression, which is an object that describes the pattern of a string. With this expression understandable to the computer, we are able to locate the data that matches this pattern and retrieve the information we want.

 

"A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern."

 

——Quoted from Wikipedia.com 

 

How does a Regular Expression help us pull out phone numbers throughout the long text?

For example, you are looking for a way to extract at once all phone numbers from the text. This whole text has numerous sets of phone numbers scattered here and there randomly. You must be familiar with the "CONTROL + F" formula, which is built in most applications to help users find and highlight a certain string of data. 

If you are able to write a Regular Expression code that elaborates the same pattern of these phone numbers, you can enter this code into a text editor with built-in regex capability through the "find" function and the data you are looking for will be well located.

 

How to Write a Regular Expression?

If you want to extract phone numbers by using Regular Expressions but don’t know how to write one, this article may help you with this.

 

#Learn Some Basics of RegEx

Learning RegEx from scratch might take some time, while if you will be using this frequently in your daily work and hence significantly improve your productivity, it may be worth a try.

 

A good place to start is the JS RegEx tutorials in W3School. You will be learning the basic syntax of a RegEx code and the grammar of modifiers and quantifiers.

 

As this is rather complicated to get for total newbies, we will not dive into this in this article. If you want to take an easy way to instantly take advantage of RegEx, a RegEx will fit your immediate need.

 

#Use RegEx Tool Built in Octoparse

There are some ready-to-use tools that help people write RegEx in a rather easier way. Octoparse has a built-in tool to do the job.

 

octoparse regex tool box

Download Octoparse

 

With this intuitive tool at hand, the only thing you need to care about is to find the pattern of the phone numbers you are looking for throughout the text. 

 

Examples of Phone Extraction Using Regex

It could be multiple phone numbers in a single large string and these phone numbers could come in a variety of formats. Here is an example of the file format:

  • (021)1234567
  • (123) 456 7899
  • (123).456.7899
  • (123)-456-7899
  • 123-456-7899
  • 123 456 7899
  • 1234567899
  • 0511-4405222
  • 021-87888822
  • +8613012345678
  • ...

 

What is the easiest way to extract phone numbers like these? Now we are going to use the Regular Expression tool to generate Regular Expressions and match all the phone numbers quickly.

 

First, find the common character that each phone number starts with and ends with. For example, for the targeted text above, I find its source code shown below.

 

<p>Here is an example of file format </p>

<ul>

  <li>(021)1234567 </li>

  <li>(123) 456 7899 </li>

  <li>(123).456.7899 </li>

  <li>(123)-456-7899 </li>

  <li>123-456-7899 </li>

  <li>123 456 7899 </li>

  <li>1234567899 </li>

  <li>0511-4405222 </li>

  <li>021-87888822 </li>

  <li>+8613012345678 </li>

  <li>... </li>

</ul>

 

Each phone number starts with <li> and ends with </li>. And we can use the RegEx Tool in Octoparse to quickly extract all phone numbers. 

       1. Run Octoparse and open the RegEx Tool.

       2. Copy and paste the source code in the "Source Text" box.

           Then select "Start With" option and enter "<li>".  

       3. Next, select "End With" option and enter "</li>".

           Don’t forget to select the "Match All" option.

       4. Click "Match".

 

regex to extract phone numbers example

When it's done, all the matched phone numbers are listed in the box on the left-hand side.

 

However, if you can’t find out the common character that each phone number starts with and ends with, the tool won't be sufficient to generate a Regex code. You may need to equip yourself with more knowledge of Regex syntax and write a special Regular Expression for each pattern.

 

I wrote down two additional Regular Expressions for two formats of phone numbers.

 

  • Regular Expression:

Code: \d{3}-\d{8}|\d{4}-\d{7}

Match: 0511-4405222 | 021-87888822

 

 

 

  • Regular Expression:

Code: \(\d{2,4}\)\d{6,7}

Match: (021)1234567 | (0411)123456 | (000)000000 |(123)1234567

 

Finding a pattern of phone numbers among the text and come up with a Regex code that describes the patter is the key of this task.

 

Except for extracting data, Octoparse Regular Expression Tool is helpful in data cleaning as well.

 

Artículo en español: RegEx: Cómo Extraer Todas Las Direcciones de Email de Cadenas o Archivos TXT También puede leer artículos de web scraping en el sitio web oficial

Artikel auf Deutsch: RegEx: Extrahieren aller Telefonnummern aus Zeichenketten.

Sie können unsere deutsche Website besuchen.

 

 

Author: The Octoparse Team 

Octoparse Download

 

More Resources

RegEx: How to Extract All Email Addresses from TXT Files or Strings

How to Generate Sales Lead Using Web Scraping?

Email Extractor: Gathering Sales Leads in Minutes

Top 20 Web Scraping Tools to Scrape the Websites Quickly

Web Scraping Templates Take Away

 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline