Web Scraping Feature Study | Scraping from multi-pages: pagination without "Next" buttonWednesday, September 28, 2016 8:32 AM
What it is？
Pagination action is usually used when the content we want to scrape spans over different pages of a website. Octoparse mimics human browsing behaviors, so just as you would click to the next page as you browse through a website, Octoparse does the same when you use pagination feature.
Sometimes, we may encounter such situation: there is no "Next" button for us to loop click for pagination. In this case, we should modify XPath to locate the next page on our own.
When do you want to use it?
If you would be extracting data from more than one page, then use pagination to enable page flipping.
Specifically, we should find locate next pages by ourselves if we encounter such situation where there is no "next" button for us to turn pages.
Download my extraction task of this tutorial of scraping data with pagiantion HERE just in case you need it.
How to use it?
Step 1. To define a loop click action for turning pages , we need drag a "Loop" item into Workflow designer and select "Single Element" in the "Loop Mode" first.
Step 2. To make sure you locate the first page so that you could get the data from all the pages, then modify XPath to locate the next page.
Step 3. To loop click all the items in the cycle pages, we need to select "Click Items in the loop" to scrape data within each page.
Now you've learned how to flip through pages to scrape data without "Next" button. Let’s look into how pagination works with this example.
Or, learn more about pagination related topics:
- Pagination Loop issue: The extraction stops after 3 pages
- Create A Loop For Pagination Manually
- Pagination Scraping: Configure “Loop click next page” When It Can’t Be Detected
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!