Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Dealing with Infinitive Scrolling/Load More

Saturday, September 29, 2018

In many cases, pagination is not an option for loading content, you will need to either

 

 

This tutorial will show you how to configure a task in Octoparse to deal with these two situations, making sure all available data is extracted. 
 

 

 

 

1) Infinitive Scrolling

 

 

Infinite-scrolling,also known as "endless scrolling" is a technique used most often by websites with JavaScript or AJAX to load additional content dynamically as user scroll down to the bottom of the webpage. Twitter is one well-known example that employs infinitive scrolling. 
 
Similar to how you will manually scroll down the page, Octoparse does it the same way with the proper settings. Basically all you need to do is to tell Octoparse which page to scroll, how many times to scroll and the time interval between every two scrolls. 
1) Navigate to the webpage that needs to be scrolled
It should either be an "Open Webpage" action or "Click" action depending on how the page is connected to the previous action in the workflow. 
 
Scraping With Infinitive Scrolling_Step1
 
2) From "Advanced Option", locate the option for "Scroll Down"
3) Check "Scroll down to bottom of the page when finished loading"
4) Input the desired number for "Scroll times" and the number of seconds inbetween the scrolls
5) From the dropdown menu, choose whether you would like to scroll down to the bottom of the page or scroll down for one screen.
6) Click "OK" to save the settings
 
 

Tips!

It is easy to set up for infinitive loading but to find the most appropriate settings, you might want to test running the task to see if you’ve assigned enough scroll times and if the scrolling is working with the right pace. 

 

 

 

 

2) Click "Load more" button

 

In addition to infinitive scrolling, some webpage requires clicking on the "Load More" button as more content loads dynamically via AJAX.

To capture all available contents from the page, I will configure Octoparse to first click on the "Load more" button repeatedly until all the information needed is revealed, then go on to capture all the information at once. 

Let’s see how it is done using Health.usnews.com (link) as an example [Download the example task]

1) Navigate to the page if you are not already there. Notice more content gets loaded every time you click on the "Load More" button located at the bottom of the page. 
2) Hover over the "Load More" button and click on it (or right click if left click triggers the link). 


3) From the Action Panel, a variety of the next possible actions is provided. Go ahead and select "Loop click the selected link". This will tell Octoparse to click on the button repeatedly. 

Scraping with load more

4) Now, toggle the workflow switch on the top and you should see the workflow generated by Octoparse. Although the click was identified by Octoparse as a paginating action, the "Load More" click is more often done via AJAX.
  • Click on the action "Click to paginate" from the workflow
  • From "Advanced Options", select "Load the page with AJAX" and set the timeout to as long as needed (eg. 1 or 2 seconds usually).

Use Ajax for scraping with load more

Tips: 

If you only wish to click the "Load More" button for X many of times, select the Pagination Loop from the workflow and click open "When loop end" setting from "Advanced options", set the execution times to the X. Scraping with load more button


5) Now, you can build a list of the sections to loop through (see lesson 4 ).


6) And proceed to extract the specific data fields from each section (see lesson 4 ).

Scraping with Load More button

7) Test running the task with "Local Extraction ". Every website works differently, so it is important to always test run the task and see if all steps in the workflow are executed correctly.  

Tips:  

1. If the extraction loop has been built inside of the pagination loop, drag it out manually since we would want to finish the first loop before executing the second.

Scraping with load more button 

2. If an action had been made by mistake, use "Undo Action" to cancel the action. 

 

 

Related Articles:

What's new in Octoparse 7.X? 

Dealing with AJAX 

Select items in a drop-down menu 

Use lists to extract 

Extract multiple pages through pagination 

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png