undefined

Web Scraping - Modify XPath For "Load More" Button with Octoparse

Thursday, March 2, 2017 9:15 PM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Many websites use a "Load More" or "Show More" button to load content in a continuous manner. This technique is very commonly used by websites for creating a better user experience.

 how to deal with load more

Unlike pagination with a "Next" button, the "Load More" button keeps adding more content onto one single web page, which makes it trickier to scrape. In this article, I will show you how to deal with the "Load More" button in Octoparse.

 

1. Use Auto-detect to deal with the "Load More" button

2. Create a pagination action manually

 

You may need this example link to follow through:

https://www.capterra.com/search/category?search=CRM%20Software

 

1. Use Auto-detect to deal with the "Load More" button

  • Start the Auto-detect process and you will be provided with the option to Click on a "Load More" button on the Tips Panel. 

paginate to scrape more pages

  • Click "Check" to see if the load more button has been located correctly. If not, you can click "Edit" to choose the right button
  • Click "Edit" to set up the number of clicks, which is how many times you want to click on the Load More button

number of click

  • Click "Create workflow" to generate the settings

The workflow should look like the following image:

otd workflow

 

With this workflow, Octoparse will click the "Load More" button along with extracting data. If the "Number of clicks" is set to 20, and every time you click the "Load More" button it will load 20 new items, Octoparse will extract the newly loaded 20 items each time with each clicking on the "Load More" button.

 

2. Create a pagination action manually

  • Select the "Load More" button on the web page and choose "Loop click single element"
  • Set up a proper AJAX timeout (what is AJAX?)

 

Tip!

1. If you only wish to click the "Load More" button for X number of times, click Pagination box, tick "Repeats" and set Repeats to the number X.

Ajax

2. If you find that the task gets many duplicates during scraping, you can drag the Loop Item out of the Pagination so that Octoparse will start to scrape after the loading all the items.

 

If you have any questions, you are welcome to submit a request here. Our support team will get back to you within 24 hours. 

 

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept Close