Blog > Knowledge > Post

Sentiment Analysis For Hotel Reviews

Wednesday, September 11, 2019

Whether you like it or not, guest reviews are becoming a prominent factor affecting people's bookings/purchases.


Think about your own experience. When you are seeking out a place to stay for a vacation on Expedia/Booking/TripAdvisor, what do you do? I am willing to bet that you'd be scrolling down the screen to check on the reviews before you know it. 


In other words, guest reviews clearly influence people's booking decisions, which means, you'd better pay attention to what people are saying about your hotel!


Not only you want to read the reviews, but analyze them in a way that can help you learn the most about your customers. The reviews can tell you if you are keeping up with your customers' expectations, which is crucial for developing marketing strategies based on the personas of your customers. 
Reviews are important and you, as hotel owners, need to start leveraging it. 


But how?


What is sentiment analysis

Sentiment analysis, also called opinion mining, is a text mining technique that could extract emotions of a given text – whether it is positive, negative or neutral, and return a sentiment score. This technique is usually used on reviews or social media texts.


In this article, I'll show you how to effectively collect hotel reviews using a web scraping tool and conduct sentiment analysis using Python.


Scrape reviews using Octoparse

The web scraping tool I used is called Octoparse. It is a do-it-yourself web scraper built for people without coding backgrounds, like myself. I'll show you how to use Octoparse to scrape the reviews of the #1 ranked hotel in New York City – Hotel Giraffe by Library Hotel Collection on TripAdvisor.  


Here is the link to the web page:



First, we will import our targeted web URL in Octoparse.


Notice there are only 5 reviews on each page, so if we need to go through all reviews, we will need to have Octoparse paginate through all the review pages. 


As we look carefully at the reviews, we can see there is a "Read more" button on some of the reviews. In this case, our crawler will need to click the button to load the whole review before extracting it.



Next, we'd loop through all the review items and extract each review.


Last but not least, drag the newly-created "loop item" out and position it under the first "loop item". This is because we want to click all the "Read More" first before proceeding to extract the actual reviews.



Once we have successfully extracted all reviews for this hotel, we are ready to get the sentiment score for each review using Python. 



For a more detailed step-by-step tutorial on scraping guest reviews, check this post.

Besides, you could try to gather hospitality information from booking.com.


Sentiment analysis with Python

First, we'd import the libraries. Here we will use two libraries for this analysis.


The first one is called pandas, which is an open-source library providing easy-to-use data structures and analysis functions for Python.


The second one we'll use is a powerful library in Python called NLTK. NLTK stands for Natural Language Toolkit, which is a commonly used NLP library with a lot of corpus, models, and algorithms.


import library 



Let's go ahead and import the reviews scraped.


import review


Here we have applied a function called SentimentIntensityAnalyzer() in nltk.sentiment.vader. The SentimentAnalyzer can implement and facilitate sentiment analysis tasks with NLTK algorithms and features, so the sentiment scores can be generated without complex coding. Before we use it, we need to call it.


call function


Now we have called the function, apply it to generate the polarity scores. There are four types of scores: negative, neutral, positive and compound. By using apply() and lambda, we could transform the result and put them into the "reviews" data frame.


apply and lambda


Then we have the sentiment score for each review.


sample result 


Each review has a negative score, a neutral score, a positive score, and a compound score. The compound score is a comprehensive assessment of the first three scores. This score ranges from -1 to 1. Normally we will set a threshold of the compound score to identify the sentiment. Here we could set the threshold as ±0.2. If the compound score of a review is greater than 0.2, then the review is positive. If the compound score of a review is less than 0.2, then the review is negative. If the compound score is between -0.2 and 0.2, then the review is neural.


percentage of reviews


As we can see, 97.2% of the reviews are positive and only 1.22% of the reviews are negative. Based on the result, it is safe to say that Hotel Giraffe by Library Hotel Collection is a well-liked hotel.

Of course, there's so much more we could do to further analyze the reviews:

  • build a word cloud or topic modeling model to identify what are the key reasons people love this hotel.
  • compare the sentiment scores with other hotels by extracting the reviews from other hotels and analyzing with the above steps.
  • extract more information like review date, reviewer contribution, reviewer helpful vote, review helpful vote, the number of shares, etc, visualize them and apply business analysis approaches.


You now know how important reviews are to the success of your business. Why not head over to Octoparse and try it out yourself.  Octoparse is an easy-to-use web scraper that could help you turn websites into structured data within clicks. Better yet, there are ready-to-use templates and lifetime free versions. Feel free to contact us if you need any help with your web-scraping related project!


Artículo en español: Sentimiento Análisis para Comentarios de Hoteles
También puede leer artículos de web scraping en El Website Oficial


Author: Jiahao Wu



We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline