undefined

Dealing with pagination (Infinite Scroll)

Wednesday, September 27, 2017 8:44 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Infinite-scrolling, also known as "endless scrolling" is a technique used most often by websites with JavaScript or AJAX to load additional content dynamically as users scroll down to the bottom of the webpage. Usually, when you drag down the sidebar to the bottom directly, you can see the "loading" sign, and the new content will be added to the page very soon:

loading sign

 

Similar to how you will manually scroll down the page, Octoparse does it the same way with the proper settings. Basically, all you need to do is to tell Octoparse which page to scroll, how many times to scroll, and the time interval between every two scrolls. 

In this tutorial, we are going to show you how to deal with infinitive scrolling in Octoparse, you may want to use this URL to follow through 

https://biomarket.com.ar/product-category/almacen/desayuno/

*Note that this tutorial is for scrolling the whole page. If you only need to scroll a designated area of the page, please check Scrolling within designated area of a web page

 

1) Use the auto-detect algorithm to deal with it

2) Set up the infinitive scroll manually

 

1) Use the auto-detect algorithm to deal with it

 

  • Select "Auto-detect web page data" on the Tips panel.
  • Modify the scroll settings

         1. Click "Edit" under "Add a page scroll" and set up the scroll method, repeat times, and wait time as needed.

         2. Click "Confirm" to save the settings. Make sure to set up enough scroll-down times and proper intervals between two scrolls.

 

set up scrolling

 

Tips!

1. Scroll to the bottom of the page: Octoparse would scroll down to the bottom of the page directly without stopping in the middle of the page. You should consider using this option when the page only loads elements when you reach the page bottom.

2. Scroll for one screen: Octoparse would scroll for one screen only each time. Consider using this option if the page loads content as you scroll down the page continuously. 

(Scroll for one screen can be used for all websites while scroll to the bottom of the page may fail to work for some specific websites.)

3. Enter a number for Repeats which is the number of times you'd like to repeat the scroll-down (ie. the number of scrolls). You may want to scroll the target web page manually first, just to find out how many scrolls you'd need to load all the required information. 

4. Select a proper wait time between each of the two scrolls. You may want to pick a longer wait time for pages that take longer to load.

 

  • Create the workflow with the settings
  • Get the workflow like down below 
  • Click "'Scroll Page " to check or modify the settings of the scrolling

 

set scroll pages

 

 

  • Check if the Loop Item created can locate all the elements

You can go to the settings of the Loop Item to see if all the elements are located. Also, please make sure under the "Loop Mode“ part, it is a "Variable List" with the right Xpath. 

 

2) Set up the infinitive scroll manually 

 

You can set up a scroll to "Go to Web Page" or "Click Item", or add a new Loop Item to scroll down.

  • Click on the item and click "Select all", then click "Loop click each URL"

The created loop item will click on each product URL to get the data.

  • Set up the scroll down

 

a. Click the Go to Web Page action to access the settings menu. Then find the "Scroll down the page after it is loaded" section under "Options".

b. Add a Loop Item to the workflow and set choose Loop Mode as "Scroll Page"

 

  • Modify the Xpath to locate the right elements: //div[contains(@class,'product-grid-item')]/div/a
  1. Click the "Loop Item" action, then choose Loop Mode as "Variable List"     
  2. Copy and paste the Xpath under the "Element Xpath" part

modify xpath 

 

Tips!

Find more details about the page scroll-down function at Page scroll-down.

Find more detail about Loop Item at Loop Item.

 

If you need any help with task configuration or data collection, submit a ticket to our support team. We'll get back to you soon.

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline