Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape data via Google Searching

Thursday, November 01, 2018

 

In this tutorial, we are going to show you how to scrape data on Google search.

To follow through, you may want to use this URL in the tutorial:

https://www.google.com/

 

Here are the main steps in this tutorial:[Download demo task file here ]

1) "Go To Web Page" - to open the targeted web page

2) "Enter Text" – to enter single/multiple keywords to be searched through

3) Create a pagination loop - to scrape multiple listing pages

4) Extract data- to scrape all the items on each page

5) Save and start extraction - to run the task and get data

 

 

 

 

 

1) "Go To Web Page" - to open the targeted web page

· Click "+ Task" to start a task using Advanced Mode

Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Google, we strongly recommend Advanced Mode to start your data extraction project.

· Paste the URL into the "Extraction URL" box and click "Save URL" to move on

 

 

 

2) “Enter Text” – to enter single/multiple keywords to be searched through

· Click "Search box"

· Click "Enter text" on the "Action Tips"

· Enter the keyword/s you want

When inputting multiple keywords into Octoparse, Octoparse would generate a loop, and automatically enter every word into the search box, one word a time.

· Click "OK"

· Click the "Search" button

· Click "Click button" on the "Action Tips"

 

Tips!

If you find the default built-in browser is incompatible with the result page, then you could modify the browser setting.

· Click “Setting”

If you use Octoparse 7.0.2, please have the task saved before modifying the settings

· Switch the default built-in browser to Firefox 45.0.

· Click “Save” to apply the modified setting

For more about texts/keywords inputting, please refer to Text/keyword input

 

 

 

3) Create a pagination loop - to scrape multiple listing pages

· Scroll down and click the "Next Page" button on the webpage

· Click "Loop click next page" on "Action Tips"

 

 

 

 

4)Extract data- to scrape all the items on each page

We are now on the second result page. Before moving on, we'd better go back to the first page.

· Click "Go To Web Page" in the workflow.

· Click "Enter text” and “Click item” in sequence

By clicking through each step in the workflow, you can easily see how Octoparse is interacting with the website.

· Select the pagination loop in the workflow

By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.

 

Now, let’s extract the search results

· Click any 2 result sections consecutively.

Hover the mouse over the result section until the whole section desired is highlighted.

The selected sections should be highlighted in green with all the sub-elements like the title and description highlighted in red.

· Click “Select all sub-elements”

· Click “Select all”

· Click “Extract data”

· Delete the unwanted or useless data fields

· Rename the fields by selecting from the pre-defined list or inputting on your own 

 

 

 

 

 

5) Save and start extraction - to run the task and get data

· Click "Start Extraction"

· Select "Local Extraction" to run the task on your computer

 

Below is the sample output.

 

 

Is this article helpful? Contact us  any time if you need our help!

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png