Step-by-step tutorials for you to get started with web scrapingDownload Octoparse
The latest version for this tutorial is available here. Go to have a check now!
In this tutorial, we are going to show you how to scrape search results from Google Scholar. Also, the ready-to-use Google Scholar Template has been inserted in our latest version, you may want to check it out here: Task Templates.
If you would like to build the scraper from scratch, you might want to use the URL in this tutorial:
Here are the main steps in this tutorial:[Download task file here]
1)"Go To Web Page" - to open the targeted web page
2)Create a "Loop Item"- to loop enter searching keywords
We can customize our "text list" to create a loop search action. Octoparse will automatically enter every keyword in the list into the search box, one line a time.
When you click on the input field in the built-in browser, Octoparse can detect that you select a search box, the “Enter text” action will automatically appear on “Action tips”.
Go to "Loop Text" and select"Use the text in loop item to fill in the text box” and click “OK” to save.
set up "wait before execution"
3) Create a pagination loop - to scrape data from multiple listing pages
4) Create a "Loop Item" -to loop extract each item
We are now on the second page. When creating a "Loop Item", we should always start with the first item on the first page. Thus, we'd better go back to the first page.
By doing this, we can help Octoparse decide the execution order and generate the Loop Item at the appropriate position in the workflow.
We need to make sure the whole block of the first video item is covered in blue when you curse over your mouse. Only in this way, we could see the whole item block is highlighted in green after clicking, covering all other information like title, author, date...
Octoparse will automatically recognize the other items and highlight them in green.
Normally we can just click "Select all sub-elements" on the "Action Tips" panel, but under certain circumstances (like this case), Octoparse fails to do that. Thus, we’ll create a loop at first, and select the data of each block for extracting manually in the next step.
5) Extract data - to select data you need to scrape
Rename the fields by selecting from the pre-defined list or inputting on your own
Below is the output sample:
Was this article helpful? Contact us anytime if you need our help!