Scrape Data from Rakuten JP | Japanese Sites Case StudyMonday, September 11, 2017 10:14 AM
Welcome to Octoparse case tutorials! Today we are going to learn how to scrape an e-commerce website in Japan: Rakuten.com.
List features covered
Now let's get started!
Step 1. Set up basic information
Step 2. Navigate to the target website
Step 3. Creating a list of items
We can see that all the product blocks are in similar layout except ad products which are in gray background. As it is complicated to add products with different Xpath into a list, here we just add all non-ad products.
If you cannot select the right part, just click anywhere in the first block and keep clicking “Expand Area button”until the red dotted line encircles the whole block.
Now, the first item has been added to the list, we need to finish adding all the items to the list.
Now we get all the sections added to the list.
Step 4. Select the data to be extracted and rename data fields.
In this step, we will begin extracting data from the loop list of product information sections. By navigating to the "Extract Data" action and clicking it, you will notice that the first information section is outlined with green dotted line. That means we need to extract data just within this section by following the steps below. Note that the extraction action we will be setting up for this section is going to apply to the rest of the list. Say we want to capture the product name and price
Step 5. Set up pagination
Now we need to flip through multiple web pages to extract as many data as possible by setting up pagination action.
Step 6. Start running your task
Now we are done configuring the task and it's time to run the task to get the data we want.
There is Local Extraction and Cloud Extraction (premium plan). With Local Extraction, the task will be run in your own machine; with Cloud Extraction, the task will be run on Octoparse Cloud Platform, which means you can basically set it up to run and turn off your desktop or laptop and data will be automatically extracted and saved to the cloud. Features such as scheduled extraction, IP rotation and API are also supported with the Cloud. Find out more about Octoparse Cloud here.
Step 7. Check and export the data
After completing the data extraction process, we can choose to check the data extracted or click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
To learn more about how to crawl data from a website, you can refer to these tutorials:
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today!