undefined
Blog > Data Collection > Post

RegEx: How to Extract All Email Addresses from TXT Files or Strings

Thursday, August 5, 2021

Now you have a text file mixed with email addresses and text strings, and you want to extract email addresses. In this article, I will show you how to extract all email addresses from TXT Files or Strings using Regular Expression

 

Table of Contents

Option 1 Excel formula

Option 2 Excel VBA Code

Option 3 Regular Expression

 

Option#1: Excel formula  

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND (" ",A1&" ",FIND("@",A1))-1)," ", REPT(" ",LEN(A1))),LEN(A1))).

 

Using this formula to extract email addresses is frustrating if you have a hard time using Excel. Excel has strict rules on the order. It is often the case that you copy and paste a complex formula, but Excel won't accept it unless you type the expression into the cell. Another problem associated with the Excel formula is that you have to spend a certain amount of time debugging the expression, especially a long one.

 

Option#2: Excel VBA Code

 

 

Step 1: Press the "ALT+F11" keys, and it would bring you to the Microsoft Visual Basic for the Application window

Step 2: Click Insert > Module, copy and paste the following into the Module window:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Sub ExtractEmail()
'Update 20130829
Dim WorkRng As Range
Dim arr As Variant
Dim CharList As String
On Error Resume Next
xTitleId = "KutoolsforExcel"
Set WorkRng = Application.Selection
Set WorkRng = Application.InputBox("Range", xTitleId, WorkRng.Address, Type:=8)
arr = WorkRng.Value
CheckStr = "[A-Za-z0-9._-]"
For i = 1 To UBound(arr, 1)
    For j = 1 To UBound(arr, 2)
        extractStr = arr(i, j)
        outStr = ""
        Index = 1
        Do While True
            Index1 = VBA.InStr(Index, extractStr, "@")
            getStr = ""
            If Index1 > 0 Then
                For p = Index1 - 1 To 1 Step -1
                    If Mid(extractStr, p, 1) Like CheckStr Then
                        getStr = Mid(extractStr, p, 1) & getStr
                    Else
                        Exit For
                    End If
                Next
                getStr = getStr & "@"
                For p = Index1 + 1 To Len(extractStr)
                    If Mid(extractStr, p, 1) Like CheckStr Then
                        getStr = getStr & Mid(extractStr, p, 1)
                    Else
                        Exit For
                    End If
                Next
                Index = Index1 + 1
                If outStr = "" Then
                    outStr = getStr
                Else
                    outStr = outStr & Chr(10) & getStr
                End If
            Else
                Exit Do
            End If
        Loop
        arr(i, j) = outStr
    Next
Next
WorkRng.Value = arr
End Sub

Step 3: Press "Ok" to proceed with the process

Step 4: Select the range you would like to apply to the above code. In this case range A1: A4

 

 

 

Option#3: Regular Expression

The regular expression is very hard to learn if you don’t have any programming knowledge. However, you can apply this simple expression to filter the email address and then extract the data. 

[a-zA-Z0-9-_]{1,}@[a-zA-Z0-9-_]{1,}.[a-zA-Z]{1,}

 

Octoparse has a built-in RegEx Tool, which is very convenient for people to clean the extracted data. Especially for non-IT professionals, it is an extra bonus that you don't have to spend time learning python. With the Octoparse web scraping tool, it is now possible to have data extraction, cleaning, and export all-in-one.

Read Octoparse customer story: Turn web pages into sales leads

 

 

Step 1: Open Octoparse application

Step 2Copy the text string at Source Text. In this case, the text string is:

This email address is valid: web@email.net and this email address is not valid web@email. Same as what_ever@public.com is a valid email address and address test@test. is not valid! OCTOPARSE@test.com is also valid

Step 3Copy and paste the expression in the "Regular Expression" box.

Step 4Choose the "Match All" option at the bottom, and click "Match".

 

Author: Ashley Ng

Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction

Si desea ver el contenido en español, por favor haga clic en:  RegEx: Cómo Extraer Todas Las Direcciones de Email de Cadenas o Archivos TXT   También puede leer artículos de web scraping en el sitio web oficial

Related resources

 

How to Generate Sales Leads Using Web Scraping

Web Scraping for Lead Generation

RegEx: How to Extract All Phone Numbers from Strings

B2B Lead Generation: Top 10 tools for Digital Marketing in 2021

Email Extractor: Gathering Sales Leads in Minutes

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download
We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline