Step-by-step tutorials for you to get started with web scraping
Download OctoparseScrape data in Google Maps
Monday, December 27, 2021
Psst! You are reading a tutorial for Octoparse version 7.3, which is slowly on its way out. We strongly recommend that you update to the latest version of Octoparse because it is more compatible with websites like Google Maps. It can also scrape more data fields from Google Maps with a newly added partial scroll function. Upgrade right now and check the new tutorial here!
As the king of navigation apps, Google Maps not only offers us an easy way to get directions from one place to another, but also excels in finding nearby museums, new restaurants, and popular bars and clubs. You can also find ratings and descriptions of these places in Google Maps.
In this tutorial, we are going to show you how to scrape restaurant information in Google Maps. We will scrape details including restaurant name, rating, category, location, description, and hours.
To follow through, you may want to use the URL below:
https://www.google.com/maps/search/restaurants/@33.7726566,-117.8522727,13z/data=!3m1!4b1
This tutorial will also teach you:
· How to Deal with AJAX for pagination
Here are the main steps in this tutorial: [Download task file here]
1) Go To the Web page - to open the target web page
2) Create a pagination loop - to scrape all the results from multiple pages
3) Create a "Loop Item" - to scrape all the item details on the current page
4) Extract data - to select the data for extraction
5) Start extraction - to run the task and get data
1) Go To Web page - to open the target web page
· Click "+ Task" to start a task using Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Google.com/maps/, we strongly recommend Advanced Mode to start your data extraction project.
· Paste the URL into the "Extraction URL" box and click "Save URL" to move on
The default built-in browser is incompatible with Google Maps. Hence, we need to switch to a compatible browser.
· Click "Save" to save your task
Please notice that for version 7.02, before modifying Settings, you’ll need to have the task saved.
· Click "Settings" and change your browser to "Firefox 45.0" and click "Save"
This step is necessary because the default browser (user agent) is not compatible with the Google Maps website. We need to change the "Browser" in settings manually.
2) Create a pagination loop - to scrape all the results from multiple pages
· Turn on the "Workflow Mode" by switching the "Workflow" button in the top-right corner in Octoparse
We strongly suggest you turn on the "Workflow Mode" to get a better picture of what you are doing with your task, just in case you mess up with the steps.
· Click the next page button ">"
· Click "Loop click Single Button" on "Action Tips"
· Set up AJAX Load for the "Click to paginate" action
Google Maps applies the AJAX technique to the pagination button. Therefore, we need to set up AJAX Load in the "Click to paginate" step.
· Uncheck the box for "Retry when page remains unchanged (use discreetly for AJAX loading)"
· Check the box for "Load the page with AJAX" and set up AJAX Timeout as 15 seconds
· Click "OK" to save
Tips! If you want to learn more about AJAX, here is a related tutorial you might need: |
3) Create a "Loop Item" - to scrape all the item details on the current page
· Click "Go To Web Page" to go back to the first page
When extracting data throughout multiple pages, you should always begin your task-building on the first page.
· Select the first and the second section containing restaurant information on the current page
· Click "Extract data in the loop" on the "Action Tips" panel
Octoparse will automatically select all the sections on the current page. The manually selected sections will be highlighted in green with all the sub-elements highlighted in red.
Tips! Hover the mouse over the section until the whole section desired is highlighted. |
4) Extract data - to select the data for extraction
· Delete the unwanted data fields
We will keep the restaurant name, rating, category, location, description, and hours.
· Click "OK" to save
· Rename the fields by selecting from the pre-defined list or inputting on your own
5) Start extraction - to run the task and get data
· Click “Start Extraction” on the upper left side
· Select “Local Extraction” to run the task on your computer, or select “Cloud Extraction” to run the task in the Cloud (for premium users only)
Here is the sample output. You can see some blank fields in the column “Description” and column "Hours". This is because some restaurants do not contain any description and/or the hours of operation.
Tips! By default, if Octoparse cannot find the element of the defined pattern on the page, the field will be left blank. However, Octoparse may fail to find the element of the defined pattern even if the element needed is shown on the website. If you encounter this problem, here are a related tutorial you might need: · What to do with those blank fields I got in the extracted result? |
Si desea ver el contenido en español, por favor haga clic en: Cómo extraer las coordenadas de Google Maps
Was this article helpful? Feel free to let us know if you have any questions or need our assistance.
Contact us here !
Download Octoparse to start web scraping or contact us for any
question about web scraping!