Step-by-step tutorials for you to get started with web scraping

Download Octoparse

Scrape Amazon Product Data with ASIN/UPC

Friday, November 16, 2018

Using ASIN/UPC to capture some information you need for your business is good for selling on Amazon. Scraping Amazon product data with ASIN/UPC could be an assistance of learning homogeneity products and dealing with pricing strategy.

In this tutorial, I will show you how to retrieve the product data from Amazon using web scraping tool, Octoparse.

Before getting started, you’ll need to have a list of ASIN prepared in advance.

To follow through you might want to use the URL in this tutorial:

https://www.amazon.com/

 

Here are the main steps in this tutorial [Download demo task file here]

1) "Go To Web Page" - to open the target web page

2) Build a "Loop Item" - to loop click into each item on each list

3) Extract data - to select the data for extraction

4) Customize data field by modifying XPath – to improve the accuracy of a certain data field (Optional)

5) Run extraction - to run your task and get data

 

 

 

 

 

 

 

 

 

 

1) "Go To Web Page" - to open the target web page

· Create the task with "Advanced Mode".

· Paste the URL into the "Extraction URL" box and click "Save URL" to move on.

 

 

 

 

 

 

 

 

2) Build a "Loop Item" - to loop search each ASIN in the list

By pasting the ASIN list into “Text list”, we could create a loop search action, with which Octoparse will automatically enter every ASIN in the list into the search box, one code a time.

· Drop a "Loop Item" action into the workflow designer

· Click "Text list" on the "Loop Mode"

· Click "A" bar

· Paste the ASIN list into the textbox

· Click "OK" to save

 

 

 

Now, we can see the ASIN list is presented in the Loop Item box. Let’s start creating the loop search action.

· Click the search box on the web page

· Click "Enter text" on the "Action Tips"

· Input the first ASIN into the text box

· Click "Ok" to save 

We need to adjust the position of the "Enter text" action in the workflow to generate a right execution order for Octoparse.

· Drag "Enter text" action into the "Loop item"

· Check "Use the text in Loop Item to fill in the text box"

· Click "Ok" to save 

 

 

After setting up "Loop item" and "Enter text" action, we will need to add a "Click item" action to activate the search action.

· Click "Search button" on the web page

· Click "Click button" on the "Action Tips"

Since Amazon load the search results with AJAX, we need to set up "AJAX Load" to avoid the software from getting stuck.

· Uncheck "Auto retry"

· Check "AJAX Load" and set up the time

· Click "OK" to save

 

 

Tips!

If you want to learn more about AJAX, here is a related tutorial you might need:

· Deal with AJAX 

 

 

 

 

 

 

3) Extract data - to select the data for extraction

· Click the information you need on the page

· Select "Extract data" in the "Action Tips"

· Rename the fields by selecting from the pre-defined list or inputting on your own

 

 

 

 

 

 

 

 

4) Customize data field by modifying XPath – to improve the accuracy of a certain data field (Optional)

In this case, the "Reviewer" element is not always located in the same place on different detail pages. So to avoid data missing raised by this irregular location issue, we need to modify XPath in Octoparse.

In this tutorial, we are going to revise the XPath for the "Reviewer" field.

· Select the "Reviewer" data field

· Click "Customize data field"

· Select "Customize XPath"

· Paste the written XPath in "Matching XPath"

The XPath for the "Reviewer" filed is //span[@class="sx-price sx-price-large"]

· Click "OK"

 

 

Tips!

Modifying XPath in Octoparse works very well with more flexibility and accuracy than the XPath auto-generated. 

Here are some related tutorials you might need:

 · Data fetched to the incorrect data fields

 · Locate elements with XPath

 · How to associate data with nearby text?

 

 

 

 

 

 

5) Run extraction - to run your task and get data

· Click "Start Extraction"

· Select "Local Extraction" to run the task on your computer

 

 

Below is the output sample.  

 

 

Was this article helpful? Contact us any time if you need our help!

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png