Web Scraping Feature Study | Scraping from multi-pages: pagination with "Next" buttonWednesday, September 28, 2016 7:54 AM
What it is？
Pagination action is used when the content we want to scrape spans over different pages of a website. Octoparse mimics human browsing behaviors, so just as you would click to the next page as you browse through a website, Octoparse does the same when you use pagination feature.
Particularly, pagination with a "Next" button is a more common way for you to flip through pages.
When do you want to use it?
If you would be extracting data from more than one page, then use pagination to enable page flipping.
One common method for pagination with a “Next page” button is shown below, we can loop click next page to scrape.
How to use it?
Step 1. Click the "Next" button used for turning pages and select "Loop Click Next Page" to tell Octoparse to click each page in the Cycle Pages.
Step 2. To select all the web elements with a similar layout, we should create a loop list to extract these items within each cycled page.
Step 3. The "Loop Item" is created outside of the "Cycle Pages" action after we created a list of items in step 4. But this doesn't make sense, since we need to scrape all of the articles within the current page first, and then click to paginate. Thus, we need to adjust their relative nesting order manually by dragging the "Loop Item" action into the "Cycle Pages" action.
Now, let's look into how pagination works with this example [link to the case study example].
Or, learn more about pagination related topics:
- Pagination Loop issue: The extraction stops after 3 pages
- Create A Loop For Pagination Manually
- Pagination Scraping: Configure “Loop click next page” When It Can’t Be Detected
Author: The Octoparse Team