Scrape Data from Buyma.com | Japanese Sites Case StudyThursday, September 14, 2017 4:51 AM
Welcome to Octoparse case tutorials! Scraping shopping websites to collect product information has become very popular, so today we are going to learn how to scrape a Japanese shopping website: Buyma.com.
List features covered
Now let's get started!
Step 1. Set up basic information
Step 2. Navigate to the target website
Step 3. Set up pagination
As we need to flip through multiple web pages to extract as many data as possible, setting up pagination is quite important.
Step 4. Creating a list of items
We can see that all the product blocks are in similar layout, so we can add these blocks into a loop list to configure Octoparse to click each one.
Now, the first item has been added to the list, we need to finish adding all the items to the list.
Now we get all the product information added to the list.
Step 5. Modify Xpath to locate product precisely
After we click “Loop”, Octoparse will automatically click the first item in the loop list. If we go back to the "Loop Item", we will find that the list contains images links too. So we need to modify the Xpath to just locate all the links of product titles.
Step 5. Select the data to be extracted and rename data fields.
We need to click “Click Item” to load the product detail page to extract information. Note that the extraction action we will be setting up for this product is going to apply to the rest of the list.
Step 7. Start running your task
Now we are done configuring the task and it's time to run the task to get the data we want.
There is Local Extraction and Cloud Extraction (premium plan). With Local Extraction, the task will be run in your own machine; with Cloud Extraction, the task will be run on Octoparse Cloud Platform, which means you can basically set it up to run and turn off your desktop or laptop and data will be automatically extracted and saved to the cloud. Features such as scheduled extraction, IP rotation and API are also supported with the Cloud. Find out more about Octoparse Cloud here.
Step 7. Check and export the data
After completing the data extraction process, we can choose to check the data extracted or click "Export" button to export the results to Excel file, databases or other formats and save the file to your computer.
To learn more about how to crawl data from a website, you can refer to these tutorials: