Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape leads from Yellowpage

Wednesday, August 29, 2018

In this tutorial, we would show you scrape, extract and mine data from yellowpages.com. With Octoparse, you can easily extract any data you need to generate maximum local leads for your business and boost your sales too. Business name, address, phone number, email, etc.... Any data you see on the webpage can be extracted with our free software, no coding needed.  Just enter the URL and configure a little, and get thousands of potential list within minutes!

After configuration, simply run the task and get data in structured formats such as CSV, JSON or getting it delivered directly to your database. (Or connect Octoparse Data API  with your own system.)


By mining data from yellowpages, you can:

· Create your own business directory websites

· Get massive phone numbers for cold calling

· Offer scraping services for businesses

· Sell the leads generated to your customers

· Scrape data for email marketing

...

 

In this tutorial, we will scrape anesthesiologists information in New York on yellowpages.com as an example. (To follow through, you may want to use this link)

[Download example task file ]

 

1) "Go To Web Page" - to open the target web page

2) Create a pagination - to scrape all data from multiple pages

3) Build a "Loop Item" - to Loop click into each item on each list

4) Extract data - to select data you need to scrape

5) "Customize data field" - to reformat star-rating data (Optional)

6) Run extraction - to run your task and get data

 

 

 

 

 

 

 

 

1) "Go To Web Page" - to open the target page

      · Create the task with "Advanced Mode".

      · Paste the URL into the "Extraction URL" box and click "Save" to move on

 

Tips!

      ·  We strongly suggest turn on "Workflow" mode to get a better review of what you are doing with your task just in case you mess up with the steps.

      ·  "Advanced Mode" is highly suggested since it allows you to handle almost all complex extraction cases, such as keywords searching, scraping behind a login, opening dropdowns etc. 

 

 

 

 

2) Create a pagination - to scrape all data from multiple pages

      ·  Scroll down to the button and click on "Next" button in target web page,

      ·  Select "Loop click next page" in "Action Tips" panel.

 

 

 

3) Build a "Loop Item" - Loop click into each item on each list

      · Click on first 2 product titles one by one to create a "Loop Item" for clicking through each item on the list

        (Make sure you select the area that contains the URL to access the item page)

      · Click "Select all" and "Loop click each element" buttons on "Action Tips" panel.        

 

 

 

 

4) Extract data - to select data you need to scrape 

      · Select data you need on the item page to scrape, such as Name, Address, Opening hours, TEL etc.

      · Select "Extract data" and rename the "Field name" column if necessary.

   

 

 

 

 

5) "Customize data field" - to reformat star-rating data (Optional)

In some cases, the data you need might hide in the HTML with extra strings that you don't need. For example, we need to extract the star rating but it seems like it cannot be done by clicking to extract. In this case, we would need to extract the HTML first and then reformat the data extracted in order to trim the strings we don't need. To do this: 

      · Click star-rating area and select "Extract outer HTML of the selected element".

      · Select "star-rating" row, click "Customize data field" icon, select "Refine extracted data" option and click "Add step" button.

      · Click "Match with Regular Expression" and input the Regular Expression of "(?<=title=")(.+?)(?= star)" into "Regular Expression" box.

      · Click "OK" button.

 

   

 

Tips!

      · In Octoparse, you are able to use Regular Expression to further process or clean the data you are going to extract.

        Read more about 8 data re-format options 

 

 

 

 

6) Run extraction - to run your task and get data

      · Click "Start Extraction" and "Local Extraction".

      · Click "Export" button to export data after the extraction. 

   

 

  

Tips!

      · Run/execute your tasks with Octoparse Cloud Extraction  with a much better performance. When you run a task with "Cloud Extraction", it runs in the cloud with multiple servers using our IP's. You can shut down the app or your computer while the task is running. No need to worry about hardware limitation. Data extracted will be saved in the cloud and can be accessed any time.

 

 

 

Related Articles:

Advanced Mode 

Pagination - Capture data from multiple pages 

Re-format data extracted 

Octoparse Cloud Extraction 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png