Scrape Data from Websites with Pagination (Query Strings) (1)

Wednesday, September 28, 2016 7:54 AM

 

What it is?

Pagination action is used when the content we want to scrape spans different pages of a website. Octoparse mimics human browsing behaviors, so just as you would click to the next page as you browse through a website, Octoparse does the same when you use pagination feature.

Furthermore, Query String Pagination is one of the most common ways for us to flip through pages. It is the simple URL with query string parameter.   

                                                                                            

When do you want to use it?

If you would be extracting data from more than one page, then use pagination to enable page flipping.

There are mainly two kinds of query string pagination.

The first one shown below has the “Next page” button, while the latter does not have such feature.

 

              

 

 

In this tutorial, I will take securitysystemsnews.com for example to show you how to scrape data from websites with Pagination - “Next” button.

 

(Download my extraction task of this tutorial of scraping data with pagination HERE just in case you need it.)

 

Step 1. Start your task and set up basic information.

  • Click “Quick Start”
  • Choose "New Task (Advanced Mode)"
  • Complete the “Basic Information”
  • Click “Next”

 

 

 

Step 2. Navigate to your target webpage

  • Enter the target URL in the built-in browser        

       (URL of the example: http://www.securitysystemsnews.com/topic/Commercial-and-Systems-Integrators )      

  •  Click “Go” icon to open the webpage                                                                                                                                                                                                                                                                                                                                                     

 

Step 3. Set up pagination  

 To extract data from websites with query string pagination, you need to add a page navigation action for pagination.

  • Click on “Next” to the right of page numbers
  • Choose “Loop Click Next Page”.

        ( This will tell Octoparse to click open each page for more extraction actions. )

 

 

 

 

Step 4. Create a list of items

       Move your cursor over the article with similar layout, where you would extract the content of the article.

  • Click any where on the first section on the web page

         (Make sure the outlined box contains the data to be extracted)

  • When prompted, Click “Create a list of items” (sections with similar layout)
  • Click “Add current item to the list”

         (Now, the first item has been added to the list, we need to finish adding all items to the list)

  • Click “Continue to edit the list”
  • Click a second section with similar layout
  • Click “Add current item to the list” again

         (Now we get all the sections added to the list with similar layout)

  • Click “Finish Creating List”
  • Click “loop". This action will tell Octoparse to click on each section on the list to extract the selected data

 

 

 

Step 5.  Select the data to be extracted

  • Select the data to be extracted 
  • Right click on the title of the first section
  • Select “Extract text”
  • Follow the same steps to extract other data fields

 

 

Step 6. Re-name the data fields

        All the content will be selected in Data Fields.

  •  Click the “Field Name” to modify.

 

 

 

Step 7. Adjust the relative loop sequence

  • Drag the second “Loop Item” box before the “Click to paginate” action of the “Cycle Pages” box in the Workflow Designer. 

      (So that we can grab all the elements of sections from multiple pages.

  • Click “Save"                                                           

 

 

 

Step 8. Starting running your task

  • After saving your extraction configuration,click “Next”
  • Select “Local Extraction”
  • Click “OK” to run the task on your computer.

      (Octoparse will automatically extract all the data selected. Check the "Data Extracted" pane for the extraction progress)

 

 

 

Step 9. Check the data and export

  • Check the data extracted   

      ( The data extracted with pagination will be shown in “Data Extracted” pane. )                                                                                                                                                                                              

  • Click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer

 

 

Now you've learned how to scrape data from websites with pagination. Let’s look into how pagination works with this example [link to the case study example].

Or, learn more about pagination related topics:

 

 

 

 

Author: The Octoparse Team

 

 

 

Download Octoparse Today

 

 

For more information about Octoparse, please click here.

Sign up today!

 

 

Author's Picks

 

Octoparse Smart Mode -- Get Data in Seconds

Get Started with Octoparse in 2 Minutes

Pagination Scraping: Configure “Loop click next page” When It Can’t Be Detected

Scrape Data from Website with Pagination - Infinite Scrolling

Collect Data from eBay

Top 30 Free Web Scraping Software

30 Free Web Scraping Software

Collect Data from Amazon

Top 30 Free Web Scraping Software

- See more at: http://www.octoparse.com/tutorial/pagination-scrape-data-from-websites-with-query-strings-2/#sthash.gDCJJmOQ.dpuf
Request Pro Trial Contact
us

Leave us a message

Your name*

Your email*

Subject*

Description*

Attachment(s)

Attach file
Attach file
Please enter details of your issue and we will get back to you ASAP.