undefined

Scrape product details from Amazon

Monday, March 28, 2016 5:06 AM

For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.

 

Amazon is one of the most popular e-commerce websites around the world. Many users try to scrape it to collect product information. In this tutorial, we are going to show you how to scrape product details from Amazon.

 

You can also go to Task Templates on the main screen of the Octoparse scraping tool, and start with the ready-to-use Amazon Templates directly to save your time. Octoparse provides several Amazon templates designed for different countries such as Germany, France, the US, Spain, and India. With this feature, there is no need to configure scraping tasks. For further details, you may check it out here: Task Templates

 

If you would like to know how to build the task from scratch, you may continue reading the following tutorial or check this video below. 

 

how_to_scrape_amazon

 

  

To follow through, you may want to use this URL in the tutorial:

https://www.amazon.com/s?rh=i%3Aelectronics%2Cn%3A172541%2Cp_n_feature_four_browse-bin%3A12097501011&ie=UTF8&lo=electronics

 

Here are the main steps in this tutorial:  [Download task file hereexternal-link-symbol-1.png ]

1. Go to Web Page - open the targeted web page

2. Auto-detect the web page - create the workflow

3. Click into each product link to scrape more information

4. Extract Data - extract data on the detail pages

5. Set up AJAX timeout for "Click to Paginate"

6. Start extraction - run the task and get data

 

1. Go to Web Page - Open the targeted web page

  • Enter the URL on the home page and click "Start"

 

2. Auto-detect the web page - create the workflow

  • Click "Auto-detect web page data" and wait for the detection to complete
  • Delete unwanted fields or rename fields if needed in the Data preview
  • Uncheck the "Add a page scroll"
  • Click "Create workflow"

 A Pagination and Loop Item would be generated automatically in the workflow. If all the data you need could be scraped from the listing page, you can stop here and jump to Set up AJAX timeout for "Click to Paginate". If you want to go to each product detail page to get more info, follow the steps below.

 

3. Click into each product link to scrape more information

  • Choose “Click on link(s) to scrape the linked page(s)” on the Tips panel
  • Select "Click on an extracted data field" and select the field you want to click on from the drop-down menu (you can confirm if it's the correct link on the Data Preview)
  • Click "Confirm"

 

Octoparse will automatically go to the first product page.

4. Extract Data - extract data on the detail pages

  • Select information on the web page
  • Choose "Extract text of the selected element"
  • Repeat the above steps to extract all the data you need

 

5. Set up AJAX timeout for "Click to Paginate"

  • Click open the Action Settings of "Click to Paginate"
  • Tick "Load with AJAX" and select 10s as the AJAX timeout

 

6. Run extraction -  run your task and get data

 

Here is the sample output. 

 sample data

 

 

 

 

Happy Data Hunting!

Author: The Octoparse Team

Download Octoparse Today

 

For more information about Octoparse, please click here.

Sign up today. 

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline