undefined
Blog > Post

RegEx: How to Extract All Email Addresses from TXT Files or Strings

Tuesday, July 02, 2019

Now you have a text file mixed with email addresses and text strings, and you want to extract email addresses. In this article, I will show you how to extract all email addresses from TXT Files or Strings using Regular Expression

 

Option#1: Excel formula  

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND (" ",A1&" ",FIND("@",A1))-1)," ", REPT(" ",LEN(A1))),LEN(A1))).

 

This formula is frustrating if you have a hard time using Excel. Excel has strict rules on the order. It is often the case that you copy and paste a complexed formula, but Excel won't accept it unless you type the expression into the cell. Another problem associated with the Excel formula is that you have to spend a certain amount of time to debug the expression, especially a long one.

 

Option#2: Excel VBA code

 

 

Step 1: Press "ALT+F11" keys, and it would bring you to the Microsoft Visual Basic for the Application window

Step 2: Click Insert > Module, copy and paste the following into the Module window:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Sub ExtractEmail()
'Update 20130829
Dim WorkRng As Range
Dim arr As Variant
Dim CharList As String
On Error Resume Next
xTitleId = "KutoolsforExcel"
Set WorkRng = Application.Selection
Set WorkRng = Application.InputBox("Range", xTitleId, WorkRng.Address, Type:=8)
arr = WorkRng.Value
CheckStr = "[A-Za-z0-9._-]"
For i = 1 To UBound(arr, 1)
    For j = 1 To UBound(arr, 2)
        extractStr = arr(i, j)
        outStr = ""
        Index = 1
        Do While True
            Index1 = VBA.InStr(Index, extractStr, "@")
            getStr = ""
            If Index1 > 0 Then
                For p = Index1 - 1 To 1 Step -1
                    If Mid(extractStr, p, 1) Like CheckStr Then
                        getStr = Mid(extractStr, p, 1) & getStr
                    Else
                        Exit For
                    End If
                Next
                getStr = getStr & "@"
                For p = Index1 + 1 To Len(extractStr)
                    If Mid(extractStr, p, 1) Like CheckStr Then
                        getStr = getStr & Mid(extractStr, p, 1)
                    Else
                        Exit For
                    End If
                Next
                Index = Index1 + 1
                If outStr = "" Then
                    outStr = getStr
                Else
                    outStr = outStr & Chr(10) & getStr
                End If
            Else
                Exit Do
            End If
        Loop
        arr(i, j) = outStr
    Next
Next
WorkRng.Value = arr
End Sub

Step 3: Press "Ok" to proceed with the process

Step 4: Select the range you would like to apply to the above code. In this case range A1: A4

 

 

 

Option#3: Regular Expression

 

The regular expression is very hard to learn if you don’t have any programming knowledge. However, you can apply this simple expression to filter the email address. 

[a-zA-Z0-9-_]{1,}@[a-zA-Z0-9-_]{1,}.[a-zA-Z]{1,}

 

Octoparse has built-in RegEx Tool, which is very convenient for people to clean the extracted data. Especially for non-IT professionals, it is an extra bonus that you don't have to spend time to learn python. With the Octoparse web scraping tool, it is now possible to have data extraction, cleaning, and export all-in-one.

 

 

Step 1: Open Octoparse application

Step 2Copy the text string at Source Text. In this case, the text string is:

This email address is valid: web@email.net and this email address is not valid web@email. Same as what_ever@public.com is a valid email address and address test@test. is not valid! OCTOPARSE@test.com is also valid

Step 3Copy and paste the expression in the "Regular Expression" box.

Step 4Choose the "Match All" option at the bottom, and click "Match".

 

Author: Ashley Ng

Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction

Si desea ver el contenido en español, por favor haga clic en:  RegEx: Cómo Extraer Todas Las Direcciones de Email de Cadenas o Archivos TXT   También puede leer artículos de web scraping en el sitio web oficial

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download