RegEx: How to Extract All Phone Numbers from StringsTuesday, August 10, 2021
Sometimes a Regex tool can help you out from perplexing learning materials and make Regex writing super easy. This is a fast guide for beginners to extract phone numbers from strings.
Table of Contents
RedEx stands for Regular Expression, which is an object that describes the pattern of a string. With this expression understandable to the computer, we are able to locate the data that matches this pattern and retrieve the information we want.
——Quoted from Wikipedia.com
How does a Regular Expression help us pull out phone numbers throughout the long text?
For example, you are looking for a way to extract at once all phone numbers from the text. This whole text has numerous sets of phone numbers scattered here and there randomly. You must be familiar with the "CONTROL + F" formula, which is built in most applications to help users find and highlight a certain string of data.
If you are able to write a Regular Expression code that elaborates the same pattern of these phone numbers, you can enter this code into a text editor with built-in regex capability through the "find" function and the data you are looking for will be well located.
How to Write a Regular Expression?
If you want to extract phone numbers by using Regular Expressions but don’t know how to write one, this article may help you with this.
#Learn Some Basics of RegEx
Learning RegEx from scratch might take some time, while if you will be using this frequently in your daily work and hence significantly improve your productivity, it may be worth a try.
A good place to start is the JS RegEx tutorials in W3School. You will be learning the basic syntax of a RegEx code and the grammar of modifiers and quantifiers.
As this is rather complicated to get for total newbies, we will not dive into this in this article. If you want to take an easy way to instantly take advantage of RegEx, a RegEx will fit your immediate need.
#Use RegEx Tool Built in Octoparse
There are some ready-to-use tools that help people write RegEx in a rather easier way. Octoparse has a built-in tool to do the job.
With this intuitive tool at hand, the only thing you need to care about is to find the pattern of the phone numbers you are looking for throughout the text.
Examples of Phone Extraction Using Regex
It could be multiple phone numbers in a single large string and these phone numbers could come in a variety of formats. Here is an example of the file format:
- (123) 456 7899
- 123 456 7899
What is the easiest way to extract phone numbers like these? Now we are going to use the Regular Expression tool to generate Regular Expressions and match all the phone numbers quickly.
First, find the common character that each phone number starts with and ends with. For example, for the targeted text above, I find its source code shown below.
<p>Here is an example of file format </p>
<li>(123) 456 7899 </li>
<li>123 456 7899 </li>
Each phone number starts with <li> and ends with </li>. And we can use the RegEx Tool in Octoparse to quickly extract all phone numbers.
1. Run Octoparse and open the RegEx Tool.
2. Copy and paste the source code in the "Source Text" box.
Then select "Start With" option and enter "<li>".
3. Next, select "End With" option and enter "</li>".
Don’t forget to select the "Match All" option.
4. Click "Match".
When it's done, all the matched phone numbers are listed in the box on the left-hand side.
However, if you can’t find out the common character that each phone number starts with and ends with, the tool won't be sufficient to generate a Regex code. You may need to equip yourself with more knowledge of Regex syntax and write a special Regular Expression for each pattern.
I wrote down two additional Regular Expressions for two formats of phone numbers.
- Regular Expression:
Match: 0511-4405222 | 021-87888822
- Regular Expression:
Match: (021)1234567 | (0411)123456 | (000)000000 |(123)1234567
Finding a pattern of phone numbers among the text and come up with a Regex code that describes the patter is the key of this task.
Except for extracting data, Octoparse Regular Expression Tool is helpful in data cleaning as well.
Artículo en español: RegEx: Cómo Extraer Todas Las Direcciones de Email de Cadenas o Archivos TXT También puede leer artículos de web scraping en el sitio web oficial
Artikel auf Deutsch: RegEx: Extrahieren aller Telefonnummern aus Zeichenketten.
Sie können unsere deutsche Website besuchen.
Author: The Octoparse Team