All Collections
Using Octoparse
Tackle Pagination
Dealing with pagination (infinite scroll)
Dealing with pagination (infinite scroll)
Updated over a week ago

Infinite-scrolling, also known as "endless scrolling," is often used by websites with JavaScript or AJAX to load additional content dynamically as users scroll down to the bottom of the webpage.

Usually, when you drag down the sidebar to the bottom directly, you can see the "loading" sign, and the new content will be added to the page very soon:

45.png

Like how you will manually scroll down the page, Octoparse does it the same way with the proper settings. All you need to do is to tell Octoparse which page to scroll, how many times to scroll, and the time interval between every two scrolls.

This tutorial will show you how to deal with the infinitive scroll in Octoparse.

To follow through, you may want to use the URL below:

Note: This tutorial is for scrolling the whole page. If you only need to scroll a designated area of the page, please check Scrolling within designated area of a web page


1) Use the auto-detect algorithm to deal with it

  • Select Auto-detect web page data in the Tips panel.

1.png
  • Modify the scroll settings

Click Edit under Add a page scroll and set up the scroll mode, repeat times, and wait time. Then click Confirm to save the settings.

2.png

Tips:

1. Scroll to the bottom of the page: Octoparse would scroll down directly without stopping in the middle of the page. It would help if you considered using this option when the page only loads elements when you reach the bottom.

2. Scroll for one screen: Octoparse would scroll for one screen only each time. Consider using this option if the page loads content as you scroll down the page continuously.

(Scroll for one screen can be used for all websites, while scrolling to the bottom of the page may fail to work for some specific websites.)

3. Enter a number for Repeats which is the number of times you'd like to repeat the scroll-down (i.e., the number of scrolls). You may want to scroll the target web page manually in advance to find out how many scrolls you'd need to load all the required information.

4. Select a proper wait time between each of the two scrolls. You may want to pick a longer wait time for pages that take longer to load.

  • Create the workflow with the settings

4.png

You will get a workflow as shown in the picture below:

114.png
  • You can click Scroll Page to check or modify the settings of the scrolling

scrol.png

Note: Make sure to set up enough scroll-down times and proper intervals between two scrolls.

  • Check if the Loop Item can locate all the elements correctly

You can go to the settings of the Loop Item to see if all the elements are located. Please ensure the Loop Mode is Variable List with the correct XPath.

6.png

2) Set up the infinitive scroll manually

You can set up a scroll to Go to Web Page or Click Item or add a new Loop Item to scroll down.

  • Click on the item and click Select all, then click Loop click each URL.

The created loop item will click on each product URL to get the data.

1.gif
  • Set up the scroll down to Go to Web page(Add a scroll combing with the Go to Webpage/Click Item)

- Click Go to Web Page.

- Tick Scroll down the page after it is loaded in the Options.

scroll_settings.png
  • Alternatively, you can set up the scroll down with a Loop Item (as a separate step)

- Add a Loop Item to the workflow and set the Loop Mode as Scroll Page.

loop_mode.gif

Also, make sure the Loop Item locates the right elements with Variable List.

  • Click Loop Item, then choose Loop Mode as Variable List

  • Input the matching XPath as: //div[contains(@class,'product-grid-item')]/div/a

120.png

Note: Check out more on Page scroll-down & Loop Item

Did this answer your question?