Blog > Knowledge > Post

Scrape Amazon Product Reviews and Ratings for Sentiment Analysis

Friday, September 22, 2017

 

Amazon is one of the leading e-commerce companies that possess customers’ data. If we could analyze these customers’ data, we could make a wiser strategy to advance our service and revenue. So in this post, I will show you how to scrape reviews and related information of Amazon products, and perform a basic sentiment analysis on the reviews.

 

How to scrape Amazon product reviews and ratings

Nowadays, almost every kind of data on the web could be scraped. By selecting certain elements on the web and then parse the information, you are able to get the data. So does Amazon. In the past, most of people obtain such kind of data by hiring web scraping specialist, or they do it themselves by writing the code. However, today anyone could scrape such kind of data using the web scraping tools.

 

A simple example of reviews and ratings extraction in the web scraping tool Octoparse would be found in this post. Here I will extract the reviews of the movie Me Before You.

 

Let’s make a few simple task on Octoparse to scrape the reviews on Amazon first.

 

Step 1. Create the task

Simply by clicking “New task”. And then complete the information.

 

 

Step 2. Open the web page

Enter the target URL into the search box. And then you could find that Octoparse open the web page in the built-in browser just like you opening the web pages in other browsers.

(https://www.amazon.com/Me-Before-You-Emilia-Clarke/product-reviews/B01GIIVF6K/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent )

 

 

Step 3. Loop click to navigate the next page

Navigate to the “Next page” button. Just click the “Next page” button and then choose “Loop click the element” in the pop-up window.

 

 

Step 4. Create a loop list for multiple sections

To process the list of reviews for extracting the elements in each section, you need to create a loop list.

Move your cursor over the section with similar layout, where you would extract data.

Click the first section ➜ Create a list of sections with similar layout. Click “Create a list of items” (sections with similar layout). ➜ “Add current item to the list”.

Then the first section has been added to the list. ➜ Click “Continue to edit the list”.

Click the second section ➜ Click “Add current item to the list” again. Now we get all the links with similar layout. ➜Then click “Finish Creating List” ➜ Click “loop”.

 

 

Step 5. Select the data to be extracted and rename data fields

Now we will begin to extract the overall reviews and ratings of the movie first.

Click the reviewer ➜ Select “Extract text”.

 

 

Follow the same steps to extract the other data fields(rating, review, time).

 

 

Rename the field names if necessary.→ Click "Save".

 

 

Now you have finished creating a task in Octoparse. Just run the task in local machine to retrieve the data.

 

If you are interested, you could check out these posts/videos about scraping Amazon product reviews for more details.

 

 

Sentiment Analysis in Semantria

Now that I’ve obtained the data, what can we do with this? Sure enough, we could read through all these reviews to see how others feel about it, but it would take quite a long time. That’s why we need sentiment analysis.

 

Sentiment analysis allows us to obtain the general feeling on some text. Although we could just look at the star ratings, actually they are not always consistent with the sentiment of the reviews. Sentiment is measured with three different values: a negative value representing a negative sentiment, while a neutral value representing a neutral one and a positive value representing a positive one.

 

Here I used the sentiment tool Semantria, a plugin for Excel 2013. Semantria simplifies sentiment analysis and makes it accessible for non-programmers. I export the extracted data to Excel (see the results below).

 

I would only analyse the first 100 reviews to show you how to make a simple sentiment analysis here. Here is the results:

 

The column “Document Sentiment +/-” gives me the overall sentiment of the each review, telling me whether it’s positive, negative or mixed. The column “Document Sentiment” gives the numerical values to tell me how positive or negative each review is.

 

The information could be displayed in a more user-friendly way by creating a column chart.

 

 

By calculating the Document Sentiment Value, you could find that the positive perceptions around value is 26.89, much higher than other perceptions value, comparing the neutral value 0.54, mixed 0.70 and negative -1.79. Considering the overall rating star 4.4 of the movie Me Before You, the values among different perceptions are highly consistent despite small difference.

 

 

To confirm that, I further look for the phase sentiment value.

  

Let’s take a closer look.

Phrase Sentiment

Phrase Mentions Sentiment +/-

 

Rating

negative

neutral

positive

Sum

2.0

-0.563729823

0.392652005

0.600000024

0.428922

4.0

-14.94552305

6.095596494

15.26827288

6.418346

5.0

-31.15602022

38.07776087

131.7180169

138.6398

Sum

-46.6652731

44.56600937

147.5862898

145.487

You can see here there is a major consistency between stars and sentiment, though the rating star 5.0 has the highest negative value. But this may be resulted by the overall number of the rating 2.0.

 

By comparing the distribution of the rating , you could find the average star rating is distributed around 5.0 (positive sentiment), which further confirm the high consistency between stars and sentiment.

 

 

The above method obviously is a simple approach, and there are a number of other widely known methods of sentiment analysis like machine learning. Also, this method isn’t limited to movie reviews. It could be applied to a range of other scenarios. And you could create much more in-depth analysis.

 

More related resources:

Web Scraping Tool & Twitter Data Set Processing

Web Data Crawling & "Bag-Of-Words" for Data Mining

 Website Crawler & Sentiment Analysis

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png