Web Crawling Case Study | Scraping ASTA with Pagination (2) - No "Next Button" Found

Wednesday, April 05, 2017 2:44 AM

 

Brief Intro

In the tutorial Scraping from multi-pages: pagination without "Next" button, we have learnt how to flip pages without a "Next" button. 

In this tutorial, I will take ASTA website for an instance to show you how to scrape data from websites with pagination without "Next Button" step by step.

 

List features covered 

  • Pagination
  • Building a list
  • Modify XPath

 

Now, let's get started!

 

Step 1. Set up basic information and navigate to the target website

 

Step 2. Find the pages to scrape

 In this website, the searching content will not be displayed until you click an item to prompt searching by yourself.

  • Click on the “Find” button
  • Click “Click an item”

 

Step 3.  Set up Pagination 

  • Drop a “Loop” item into Workflow designer.  
  • Choose a “Loop Mode” under “Advanced Options”.
  • Select “Single Element” option.

 

Step 4. Modify XPath to locate next page

  •   Make sure you locate the first page so that you could get the data from all the pages. 
  •   Then paste the Xpath : //div/table/tbody/tr[@class='cssPager']/td/table/tbody/tr/td/span/../following-sibling::td[1]/a[1] in the “Single Element” text box.
  •   Click “Save”.

 

Step 5. Click Items in the loop to scrape data with pagination

  •  Drop a “Click Item” into the “Loop item”
  •  Choose “Click Loop items” under “Advanced Option”
  •  Click “Save”.

Now you’ve configured pagination scraping. 

 

Step 6. Create a list of items

 Move your cursor over the article with similar layout, where you would extract the content of the article.

  • Click any where on the first section on the web page 
  • Click “Expand the selection area” to the point where the outlined box includes all the content you want to scrape.

 If The selection had not been identified properly in the first place. 

  • When prompted, Click “Create a list of items” (sections with similar layout)
  • Click “Add current item to the list”

 Now, the first item has been added to the list, we need to finish adding all items to the list

  • Click “Continue to edit the list”
  • Click a second section with similar layout
  • Click “Add current item to the list” again

 Now we get all the sections added to the list with similar layout

  • Click “Finish Creating List”
  • Click “loop”, this action will tell Octoparse to click on each section on the list to extract the selected data

 

Step 7. Select the data to be extracted and Rename data fields.

  • Click the data field “Full Name”.
  • Select “Extract text”
  • Follow the same steps to extract the other data.
  • Rename the any field names if necessary.
  • Click "Save"

 

Step 8. Re-order workflow

Notice that the loop action for data extraction is positioned outside of the loop for pagination. This doesn’t make sense, right? Since we want to extract from each page before turning to the next page. So, we’ll need to manually drag the data extraction loop to the inside of the pagination loop, position it right before “Click to paginate” action in the workflow designer.

Now, look at the workflow we created, extract and turn page, then loops back to extract and turn page, exactly what we want.

 

Step 9. Starting running your task 

  • After saving your extraction configuration,click “Next”
  • Select “Local Extraction”
  • Click “OK” to run the task on your computer.

Octoparse will automatically extract all the data selected. Check the "Data Extracted" pane  for the extraction progress 

 

Step 10. Check the data and export

  • Check the data extracted
  • Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer

 

Now you've learned how to flip through pages to scrape data without "Next" button. Let’s look into how pagination works with this example.

Now check out similar case studies:

Or, learn more about pagination related topics:

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

 

For more information about Octoparse, please click here.

Sign up today!

 

 

Author's Picks

 

Octoparse Smart Mode -- Get Data in Seconds

Get Started with Octoparse in 2 Minutes

Pagination Scraping: Configure “Loop click next page” When It Can’t Be Detected

Scrape Data from Website with Pagination - Infinite Scrolling

Collect Data from eBay

Top 30 Free Web Scraping Software

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
Request Pro Trial

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.
× get my coupon now No Thanks