Web Scraping Case Study | Crawling data from Rakuten Global MarketWednesday, May 10, 2017 12:46 PM
In this tutorial, we will walk through the detailed steps to crawl data from retail website, rakuten.com.
List of features covered
Now, let's get started!
Step 1. Set up basic information
Step 2. Navigate to the target website
Step 3. Create a list of items
Move your cursor over the products with similar layout, where you would extract the data info from.
Now, the first item has been added to the list, we need to finish adding all items to the list
Now we get all the sections added to the list with similar layout
Note: If The selection had not been identified properly in the first place. We need to click “Expand the selection area” to the point where the outlined box includes all the content you want to crawl.
Step 4. Select the data to be extracted and rename data fields
Step 5. Set up Pagination
To extract from multiple pages, we'll need to configure for pagination, meaning, we will tell Octoparse to scrape from the first page to the last page.
Step 6. Modify XPath to locate next page
In some cases, pagination does not work correctly because the auto-generated XPath is not accurate. Hence, we'll need to manually figure out the proper XPath to use and modify the setting for "click to paginate".
Step 7. Starting running your task
Octoparse will automatically extract all the data selected. Check the "Data Extracted" pane for the extraction progress.
Step 8. Export data
Now, you should be able to crawl Rakuten Global Market on your own. Get started with your own crawling task or download this Example to learn more.
To learn more about how to scrape from other high profile websites:
Or learn more about what you can do with these powerful features:
Author: The Octoparse Team
For more information about Octoparse, please click here.