Blog > Web Scraping > Post

Web Scraper 101: Tackle Pagination for Web Scraping

Saturday, January 16, 2021

Pagination is a widely used technique in web designing that splits content into various pages, thus presenting large datasets in a much more easily digestible manner for web surfers. 

There is a lot of pagination methods employed by different web developers, such as numbered pagination, infinite scrolling, etc. Although pagination is generally believed to improve user experience, the bad news is that it makes web scraping more difficult. 

If you’re trying to scrape data from a website and are facing a dilemma about how to tackle pagination for web scraping, we have you covered. Octoparse, an automatic web scraping tool, supports dealing with websites of various pagination structures. Now we are going to illustrate the various approaches for how to deal with different kinds of pagination with Octoparse, which includes:


1. Pagination with "Next" button

2. Numbered pagination without the "Next" button

3. Infinite scrolling

4. "Load More" button




1. Pagination with next button


Clicking on the next button to paginate is perhaps one of the most commonly used methods for pagination, making it easy for visitors to traverse through pages on the website. It is very simple to handle this kind of pagination for web scraping in Octoparse.

No matter if it is a next button shown in the form of the word - "Next" or just a right arrow – ">", you only need to build a pagination loop to keep clicking on the button after scraping is done with the current page. (Check an example here )


2. Numbered pagination without the "Next" button

The approach for this particular kind of pagination is very similar to that with the next button. You want to build a pagination loop to keep clicking the next page number down the line. However, since this one you won't be clicking on a static element, locating the next page number precisely is critical.

Octoparse uses XPath (XML Path Language, which uses "path like" syntax to identify and navigate nodes in an XML document) for locating any elements. So the key point here is to modify the XPath of the pagination loop to make sure it will always locate the next page number as soon as the current page's been fully scrapped (check this tutorial  for how to modify the XPath to accurately locate the next page number)



3. Infinite scrolling

Infinite-scrolling,also know as "endless scrolling" is a technique used most often by websites with JavaScript or AJAX to load additional content dynamically as users scroll down to the bottom of the webpage. Instead of using "previous/next" pagination buttons, many websites are turning to infinite scrolling, saving people from having to click through the many pages. Infinitive scrolling is typically used by websites with a large amount of data to display such as social media platforms like Facebook and Twitter. 

Octoparse deals with infinitive scrolling by mimicking the scrolling behavior. Depending on the amount of content you want to load, simply set up the appropriate scroll time and scroll way, you will have the page scrolled automatically (check an example here )



4. "Load More" button

Load more button kind of navigation is another popular alternative to infinite scrolling. In this case, you would have a specific button, like "Load More", to trigger the content loading with AJAX as you reach the bottom of the page. 

Octoparse deals with the "Load More" button with a pagination loop, which is the same as how we deal with the "Next" button, by clicking on one single button repeatedly. The difference though is that with the "Load More" button, we need to have the pagination loop run till the load button disappears before proceeding to the next step. After all the desired content gets loaded, the scraping process is as easy as scraping one single page (check more details here )



Pagination reduces page complexity and improves the readability of web content, yet it needs to be tackled using various approaches, whichever that creates maximum efficiency. If we fail to deal with pagination properly, it will result in missing data and a waste of time. Making good use of a web scraping tool like Octoparse, you can avoid the complexities of web scraping!



Artículo en español: Web Scraping 101: Abordar La Paginación para Web Scraping
También puede leer artículos de web scraping en El Website Oficial

More Resources


Troubleshooting: Why does Octoparse stop after clicking "next"?

Video: Click on the "next" button to paginate

Video: Deal with pagination without the "next" button

Web Scraping Templates Take Away

Locate Element with XPath

Octoparse Regular Expression Tool (RegEx)

Deal with AJAX

Cloud Extraction: Scrape at Large Scale

Connect Octoparse API Step by Step


We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline