Sentiment Analysis For Hotel ReviewsWednesday, September 11, 2019
Whether you like it or not, guest reviews are becoming a prominent factor affecting people's bookings/purchases.
What is sentiment analysis
Sentiment analysis, also called opinion mining, is a text mining technique that could extract emotions of a given text – whether it is positive, negative or neutral, and return a sentiment score. This technique is usually used on reviews or social media texts.
In this article, I'll show you how to effectively collect hotel reviews using a web scraping tool and conduct sentiment analysis using Python.
Scrape reviews using Octoparse
The web scraping tool I used is called Octoparse. It is a do-it-yourself web scraper built for people without coding backgrounds, like myself. I'll show you how to use Octoparse to scrape the reviews of the #1 ranked hotel in New York City – Hotel Giraffe by Library Hotel Collection on TripAdvisor.
Here is the link to the web page:
First, we will import our targeted web URL in Octoparse.
Notice there are only 5 reviews on each page, so if we need to go through all reviews, we will need to have Octoparse paginate through all the review pages.
As we look carefully at the reviews, we can see there is a "Read more" button on some of the reviews. In this case, our crawler will need to click the button to load the whole review before extracting it.
Next, we'd loop through all the review items and extract each review.
Last but not least, drag the newly-created "loop item" out and position it under the first "loop item". This is because we want to click all the "Read More" first before proceeding to extract the actual reviews.
Once we have successfully extracted all reviews for this hotel, we are ready to get the sentiment score for each review using Python.
For a more detailed step-by-step tutorial on scraping guest reviews, check this post.
Besides, you could try to gather hospitality information from booking.com.
Sentiment analysis with Python
First, we'd import the libraries. Here we will use two libraries for this analysis.
The first one is called pandas, which is an open-source library providing easy-to-use data structures and analysis functions for Python.
The second one we'll use is a powerful library in Python called NLTK. NLTK stands for Natural Language Toolkit, which is a commonly used NLP library with a lot of corpus, models, and algorithms.
Let's go ahead and import the reviews scraped.
Here we have applied a function called SentimentIntensityAnalyzer() in nltk.sentiment.vader. The SentimentAnalyzer can implement and facilitate sentiment analysis tasks with NLTK algorithms and features, so the sentiment scores can be generated without complex coding. Before we use it, we need to call it.
Now we have called the function, apply it to generate the polarity scores. There are four types of scores: negative, neutral, positive and compound. By using apply() and lambda, we could transform the result and put them into the "reviews" data frame.
Then we have the sentiment score for each review.
Each review has a negative score, a neutral score, a positive score, and a compound score. The compound score is a comprehensive assessment of the first three scores. This score ranges from -1 to 1. Normally we will set a threshold of the compound score to identify the sentiment. Here we could set the threshold as ±0.2. If the compound score of a review is greater than 0.2, then the review is positive. If the compound score of a review is less than 0.2, then the review is negative. If the compound score is between -0.2 and 0.2, then the review is neural.
As we can see, 97.2% of the reviews are positive and only 1.22% of the reviews are negative. Based on the result, it is safe to say that Hotel Giraffe by Library Hotel Collection is a well-liked hotel.
Of course, there's so much more we could do to further analyze the reviews：
- build a word cloud or topic modeling model to identify what are the key reasons people love this hotel.
- compare the sentiment scores with other hotels by extracting the reviews from other hotels and analyzing with the above steps.
- extract more information like review date, reviewer contribution, reviewer helpful vote, review helpful vote, the number of shares, etc, visualize them and apply business analysis approaches.
You now know how important reviews are to the success of your business. Why not head over to Octoparse and try it out yourself. Octoparse is an easy-to-use web scraper that could help you turn websites into structured data within clicks. Better yet, there are ready-to-use templates and lifetime free versions. Feel free to contact us if you need any help with your web-scraping related project!
Author: Jiahao Wu