You must encounter this problem when web scraping â some sites have a LOAD MORE button and you need to click it to paginate or load more content. But itâs not easy to realize this. So, in this article, we will introduce how to easily solve this web scraping with load more button problem with a web scraping tool or Python method.
No-Coding Tool to Scrape Pages with Load More Button
If youâre a non-coder and know nothing about coding, we recommend Octoparse as the best web scraping tool for you to solve the Load More button problem. It is a free tool for both Windows and Mac systems, which is easy to use and asks for nothing on coding skills. You can scrape almost all kinds of websites with its auto-detection function and preset templates. For the Load More button, Octoparse allows you to set pagination and infinite scroll with loop item. Letâs follow the simple methods below to have a try.
1. Scraping Load More Button with Pagination
You can set pagination with the Load More button if youâre scraping a multipage site. Or some sites describe this as Next. Octoparse provides auto-detecting mode or manually setting ways. Read the detailed Dealing with pagination with a âLoad Moreâ button user guide or follow the simple steps below.
Step 1: Sign up for a free account and launch Octoparse. Copy and Paste the target page link to the main panel, and it will start auto-detecting mode by default.
Step 2: Octoparse will set the pagination after the auto-detecting. Click on the âLoad Moreâ button in the Tips Panel and hit it to check if it has been located correctly. If not, you can click Edit to choose the right button. By manually, you need to select the âLoad Moreâ button on the web page, and choose Loop to click single element option. You can set up a proper AJAX timeout by yourself.
Step 3: After all data fields have been checked, run the workflow you just created. And youâll get the scraped data with the Load More button works.
2. Infinite Scroll to Load More Data
For some pages that have a Load More button, it will load more content on the same page after you click on the âLoad Moreâ button once and once. In this situation, you can set the pagination with infinite scroll easily. It also supports both automatical and manual ways, which are very similar to the Load More methods above.
3. Real Example to Solve Load More Problem with XPath
A real-life example of this kind of issue is from one of our users when he couldnât scrape all the data items from the website with the Load More button. Below is the situation.
He wrote us an email and said:
âI want help regarding scraping a website with a âshow more productâ button.
Type Links to scrape: http://dir.indiamart.com/mumbai/industrial-machinery.html
Type of data: 08447563983, Machinery And Spares.
I want to scrape the complete page including the âload more productâ button.
I have created primary steps I have attached images in the attachment.
But this only fetched 29 data from page, I want you to tell me how to add load more feature in this process.
Also, tell me more about configuring the extraction rule.
Waiting for your response.â
From the email content we can summarize two key points of his issue:
1. Load More button. (Tutorial: Scrape websites with Load More button)
  We need to make sure all the items on the web pages are displayed after clicking the Load More button repeatedly.
2. Fetched only 29 data.
  We need to check the extraction while the task is running with Local Extraction and figure out what the problem is.
So, our response is as follows:
About the Load More button
First of all, we need to make sure that, in your rule, all the items on this web page are displayed by scrolling to the bottom of the page and clicking the Load More button repeatedly.
BTW, sometimes the site will continue to load more items when scrolling down to the bottom before the âLoad moreâ button appears, we can set the scroll time and intervals in order to smooth the extraction.
About the data extracted
When only 29 data records were extracted, you need to find out the reasons why the extraction stops. I checked your task in Local Extraction and found out that:
1. Some windows pop-ups during the extraction. In this case, you need to click the close button in the built-in browser manually. And restart the task.
2. If the extraction is completed without any pop-up windows, you need to find out the place the extraction stops.
Firstly, open the web page you want to scrape in Firefox. Letâs locate to the 28th data item on the web page â we can see that itâs the item named âMohnot Instrumentsâ in Firefox. We will use the FirePath tool to find out the XPath.
(Learn more about FirePath tool: Getting started with XPath)
Secondly, go back to Octoparse and check the Loop Item(Extract Data ). In the screenshot below, an item named DIV is extracted. Itâs obvious that there is something wrong with the original XPath and we need to edit the XPath manually.
Letâs copy the original XPath and paste it in FireBug. And you will find out that the original XPath couldnât extract the items starting from the 29th. In this case, we need to modify the XPath which use to extract all items from the web page.
Thirdly, get the XPath of the section of the 29th item on the web page.
Fourth, the correct XPath should be .//*[contains(@id,âLSTâ)]
After you modify the XPath and save it, you will find that more than 32 items are extracted in the loop.
Donât forget to keep an eye on the built-in browser during the extraction, and make sure the workflow is working well.â
Through this example, we know how to scrape data from a website with the Load More button and modify the XPath that extracts all the data items from the web page.
Solve Web Scraping Load More Button with Python
âHow to scrape the website if it has load more button to load more content on the page?â
You may have the same question as above from Stackoverflow, though you know something about coding. You can find answers and discussions about this question. However, we still recommend you to try Octoparse if youâre still confused about it.