Web Scraping - Modify XPath For "Load More" Button with Octoparse
Thursday, March 2, 2017 9:15 PMFor the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
Many websites use a "Load More" or "Show More" button to load content in a continuous manner. This technique is very commonly used by websites for creating a better user experience.
Unlike pagination with a "Next" button, the "Load More" button keeps adding more content onto one single web page, which makes it trickier to scrape. In this article, I will show you how to deal with the "Load More" button in Octoparse.
1. Use Auto-detect to deal with the "Load More" button
2. Create a pagination action manually
You may need this example link to follow through:
https://www.capterra.com/search/category?search=CRM%20Software
1. Use Auto-detect to deal with the "Load More" button
- Start the Auto-detect process and you will be provided with the option to Click on a "Load More" button on the Tips Panel.
- Click "Check" to see if the load more button has been located correctly. If not, you can click "Edit" to choose the right button
- Click "Edit" to set up the number of clicks, which is how many times you want to click on the Load More button
- Click "Create workflow" to generate the settings
The workflow should look like the following image:
With this workflow, Octoparse will click the "Load More" button along with extracting data. If the "Number of clicks" is set to 20, and every time you click the "Load More" button it will load 20 new items, Octoparse will extract the newly loaded 20 items each time with each clicking on the "Load More" button.
2. Create a pagination action manually
- Select the "Load More" button on the web page and choose "Loop click single element"
- Set up a proper AJAX timeout (what is AJAX?)
Tip! 1. If you only wish to click the "Load More" button for X number of times, click Pagination box, tick "Repeats" and set Repeats to the number X. 2. If you find that the task gets many duplicates during scraping, you can drag the Loop Item out of the Pagination so that Octoparse will start to scrape after the loading all the items. |
If you have any questions, you are welcome to submit a request here. Our support team will get back to you within 24 hours.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.