Scraping Data from AmazonSaturday, December 31, 2016 2:20 AM
Amazon is one of the most popular e-commerce websites around the world. Many users try to scrape it to collect product information. In this tutorial, we are going to show you how to scrape product details from Amazon.
You can also go to "Task Templates" on the main screen of the Octoparse scraping tool, and start with the ready-to-use Amazon Templates directly to save your time. Octoparse provides several Amazon templates designed for different countries such as Germany, France, the US, Spain, and India. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates
If you would like to know how to build the task from scratch, you may continue reading the following tutorial or check this video below.
To follow through, you may want to use this URL in the tutorial:
Here are the main steps in this tutorial: [Download task file here ]
1. Go to Web Page - Open the targeted web page
- Enter the URL on the home page and click "Start"
2. Auto-detect the web page - create the workflow
- Click "Auto-detect web page data" and wait for the detection to complete
- Delete unwanted fields or rename fields if needed in the Data preview
- Uncheck the "Add a page scroll"
- Click "Create workflow"
A Pagination and Loop Item would be generated automatically in the workflow.
If all the data you need could be scraped from the listing page, you can stop here and jump to Set up AJAX timeout for "Click to Paginate". If you want to go to each product detail page to get more info, follow the steps below.
3. Click into each product link to scrape more information
- Choose “Click on link(s) to scrape the linked page(s)” on the Tips panel
- Select "Click on an extracted data field" and select the field you want to click on from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
- Click "Confirm"
Octoparse will automatically go to the first product page.
4. Extract Data - extract data on the detail pages
- Select information on the web page
- Choose "Extract text of the selected element"
- Repeat the above steps to extract all the data you need
5. Set up AJAX timeout for "Click to Paginate"
- Click open the Action Settings of "Click to Paginate"
- Tick "Load with AJAX" and select 10s as the AJAX timeout
6. Run extraction - run your task and get data
- Click on the upper left side
- Select "Run on your device" to run the task on your computer, or select "Run task in the Cloud" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us anytime if you need our help!
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.