Start with the request, not the UI
Before writing pagination logic, open DevTools and watch what changes when you move to the next batch.- Open the Network tab and filter to Fetch/XHR.
- Click the next page, scroll down, or press the load-more button.
- Inspect the request URL, query parameters, request body, and response.
- Decide whether the scraper should follow links, interact with the page, or call an API endpoint directly.
offset=40. A page link might actually hydrate results through JavaScript after the URL changes.
| What changes | What to try first |
|---|---|
URL includes page=2, p=2, or /page/2 | Loop through numbered URLs |
An <a> link points to the next page | Follow the href until it disappears or becomes disabled |
| Content appears after scrolling | Find the XHR request; use browser scrolling only if needed |
| Content appears after clicking a button | Reuse the API request or click the button in a browser session |
JSON includes next_cursor, endCursor, has_more, or offset | Paginate through the API response |
Numbered pages
Numbered pagination is the simplest case because the next location is visible in the URL:0, parameter names such as p or start, and sites that return the first page again when the page number is out of range. A repeated first page is worse than an empty page because it can create duplicate data without obvious errors.
Next links
Some sites do not expose page numbers. They only expose a “Next” link or arrow. If the element is a normal anchor, treat pagination as link following:seen_urls guard matters. Misconfigured sites sometimes point the final “Next” link back to the current page or to page one. Also check disabled states such as aria-disabled="true", disabled, or a disabled class before trusting the link.
Infinite scroll
Infinite scroll looks like a browser-only problem, but it usually has an API underneath it. Scroll once with DevTools open and look for a request that fetches the next group of records. The useful parameters are often namedoffset, page, after, cursor, or limit.
When the endpoint is usable, call it directly:
Load-more buttons
A load-more button is controlled infinite scroll. The page waits for a click before requesting the next batch. That makes pacing easier because the scraper can wait, validate the new item count, and retry if the request fails. If the button calls a clean API, use that API. If not, click the button in a browser loop:Offset and cursor APIs
Modern sites often paginate data at the API layer. Offset pagination asks for a numeric position:Retry-After, retry temporary failures with backoff, and store progress if the job is large enough that restarting from page one would be expensive.
Hybrid pagination
Real sites often combine patterns:- A category has numbered pages, but each page lazy-loads more products after scrolling.
- A search page starts with a load-more button, then switches to numbered links.
- A tabbed interface has separate pagination for “New”, “Popular”, and “Sale”.
- A listing page paginates result URLs, then each detail page has its own paginated reviews or comments.
Practical safeguards
- Define a stop signal. Empty result sets, missing next links, disabled buttons,
hasNextPage: false, repeated cursors, and max-iteration limits are all valid stop signals. - Detect duplicates. Infinite scroll and cursor APIs can repeat records when data changes mid-run. Store stable IDs or canonical URLs.
- Throttle navigation. Add small randomized waits between batches. Browser automation should wait for content changes, not only fixed timeouts.
- Log failures. If one page fails after retries, record the URL or cursor and continue when possible.
- Prefer APIs when they are legitimate and stable. Direct API pagination is usually faster and easier to validate than driving a browser.
- Use a visual tool when speed matters more than custom code. In Octoparse, pagination can be configured visually for common next-page, load-more, and infinite-scroll flows, then run locally or in the cloud.