Scrape Amazon Product Data with ASIN/UPCThursday, August 10, 2017 9:13 AM
For the latest tutorials, visit our new self-service portal. Sharpen your skills and explore new ways to use Octoparse.
If you sell things online because you want to profit, wouldn’t you want to know how your products are priced on some popular eCommerce sites such as Amazon or eBay? When your products are not selling, is it the result of a misused pricing strategy, a lousy product description, unappealing product pictures, or maybe the products are just not good enough? This is exactly why people are turning to Amazon for product mining.
In this article, I will show you how you can easily retrieve the product data you need from Amazon using the web scraping tool, Octoparse. Let’s get straight to the point.
Step 1: Get prepared
Gather the list of ASIN/UPC for the products you need (to search with on Amazon).
ASIN, short for Amazon Standard Identification Numbers (ASINs) are unique blocks of 10 letters and/or numbers that identify items on Amazon. Each item listed on Amazon will have a unique ASIN. If you happen to know what the ASINs are for the products you need to search for, great! If not, try UPC.
UPC, Universal Product Code (UPC) is a 12-digit bar code used extensively for retail packaging in the United States. Find out what the UPC’s are for your products and make a list of them.
Step 2: Capture Data
Launch Octoparse and start a new task in Advanced Mode. Octoparse also has a less tech-savvy solution called template mode. You can try it first if you like! I prefer Advanced Mode because it offers a lot more flexibility. And I never find it too "advanced" to learn.
Follow the next few steps:
- Create a Go to webpage action to enter the target URL
- Enter https://www.amazon.com, click "Save" and wait for Amazon's webpage to load in the built-in browser.
- Click the search box on the webpage ➜ when Octoparse prompts you for the next action, pick Enter text.
If you want to search repeatedly with a list of UPC codes, we'll need to utilize the loop action.
- Create a loop action
- Click on the loop action and set its loop mode to Text List
- Click the icon and paste your list of UPC codes into the text box
- Click Confirm to save
- Drag the Enter text action into the loop
- Go to the General tab for the Enter text action, tick Use text in the loop to enter the text box
- Click Apply to save the settings
Octoparse is now configured to use the text values added to the loop to search on Amazon's website. We can click through the workflow to make sure the defined actions are working as desired.
If everything is working properly, the first UPC code from the list will be synched to the text box automatically.
- Click on the search button
- When prompted, select Click button to add a click item to the loop
As soon as we are on the product detail page, capture whatever product information needed just like any other extraction task. Here, I will need product title, rating, number of reviews, product category and price.
- Click on the title of the product
- Select Extract Text when prompted
- Notice how the product title gets added to the data panel next to the workflow designer
- Capture all other data fields similarly
- Edit the field names directly corresponding to the different data fields extracted
Note that the XPath of the search icon and the data fields in the Extract data step may not be accurate, in which case we need to rewrite them manually. Check out this article on how to write an XPath yourself.
For example the correct XPath for the price is //span[@cel_widget_id="MAIN-SEARCH_RESULTS-0"]//span[@class="a-price"]/span[@aria-hidden="true"].
Now we can run the task either on the local machine utilizing local bandwidth, IP, memory, etc, or alternatively, in the Cloud with Octoparse's cloud servers (Exclusive to premium users). If you are a heavy user, I strongly recommend Cloud extraction because it’s so much a relief to leave everything in the cloud and come back for the complete data set without having to worry about network interruption or computer glitches.
Was this article helpful? Contact us any time if you need our help! You can also check the video below if you want to start your extraction with keywords.
Happy Data Hunting!
Author: The Octoparse Team
For more information about Octoparse, please click here.
Sign up today.