Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
In this tutorial, we are going to show you how to scrape data on Google search.
To follow through, you may want to use this URL in the tutorial:
Here are the main steps in this tutorial:[Download demo task file here ]
1) "Go To Web Page" - to open the targeted web page
· Click "+ Task" to start a task using Advanced Mode
Advanced Mode is a highly flexible and powerful web scraping mode. For people who want to scrape from websites with complex structures, like Google, we strongly recommend Advanced Mode to start your data extraction project.
· Paste the URL into the "Extraction URL" box and click "Save URL" to move on
2) “Enter Text” – to enter single/multiple keywords to be searched through
· Click "Search box"
· Click "Enter text" on the "Action Tips"
· Enter the keyword/s you want
When inputting multiple keywords into Octoparse, Octoparse would generate a loop, and automatically enter every word into the search box, one word a time.
· Click "OK"
· Click the "Search" button
· Click "Click button" on the "Action Tips"
If you find the default built-in browser is incompatible with the result page, then you could modify the browser setting.
· Click “Setting”
If you use Octoparse 7.0.2, please have the task saved before modifying the settings
· Switch the default built-in browser to Firefox 45.0.
· Click “Save” to apply the modified setting
For more about texts/keywords inputting, please refer to Text/keyword input
3) Create a pagination loop - to scrape multiple listing pages
· Scroll down and click the "Next Page" button on the webpage
· Click "Loop click next page" on "Action Tips"
4)Extract data- to scrape all the items on each page
We are now on the second result page. Before moving on, we'd better go back to the first page.
· Click "Go To Web Page" in the workflow.
· Click "Enter text” and “Click item” in sequence
By clicking through each step in the workflow, you can easily see how Octoparse is interacting with the website.
· Select the pagination loop in the workflow
By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.
Now, let’s extract the search results
· Click any 2 result sections consecutively.
Hover the mouse over the result section until the whole section desired is highlighted.
The selected sections should be highlighted in green with all the sub-elements like the title and description highlighted in red.
· Click “Select all sub-elements”
· Click “Select all”
· Click “Extract data”
· Delete the unwanted or useless data fields
· Rename the fields by selecting from the pre-defined list or inputting on your own
5) Save and start extraction - to run the task and get data
· Click "Start Extraction"
· Select "Local Extraction" to run the task on your computer
Below is the sample output.
Is this article helpful? Contact us any time if you need our help!