undefined

How to Scrape Data from A Website with A “Load More” Button (Example: Kickstarter)

Wednesday, May 4, 2016 7:01 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Lots of websites use a “Load More” button or “Display more data” button to load content in a continuous manner(often with AJAX). Unlike pagination with a "Next" button, the "Load More" button keeps adding more content onto one single web page, which makes it trickier to scrape. In this article, we will teach you how to deal with the "Load More" button in Octoparse.

 

We can either (1) use the auto-detect function to deal with the load more button or (2) create a pagination action manually.

Use the Kickstarter website as a sample link to follow through:

https://www.kickstarter.com/discover/popular?ref=discovery_overlay

 

After we create a task using the above URL in Octoparse, we can:

 

1. Use Auto-detect to deal with the "Load More" button

 

  • Start the Auto-detect process and wait for the process to complete 
  • Switch between the auto-detect results until you get the list data you want
  • Rename or delete data fields in the Data Preview section
  • Click Check under Click on a "Load More" button to see if the load more button has been located correctly.
  • If not, you can click Edit to choose the right button
  • Click Edit to set up the number of clicks, which is how many times you want to click on the Load More button
  • Click Create workflow to generate the workflow

 

Auto-detect result

 

With this workflow, Octoparse will click the "Load More" button along with extracting data. If the "Number of clicks" is set to 20, and every time you click the "Load More" button it will load 20 new items, Octoparse will extract the newly loaded 20 items each time with each clicking on the "Load More" button.

 

2. Create a pagination action manually

 

  • Select the Load More button on the web page and choose Loop click single element
  • Set up a proper AJAX timeout (what is AJAX?)

 

Tips!

If you find that the task gets many duplicates during scraping, you can drag the Loop Item out of the Pagination so that Octoparse will start to scrape after loading all the items.

 

Either way, we can deal with the load more button in Octoparse. 

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline