Whether you like it or not, guest reviews are becoming a prominent factor affecting people’s bookings and purchases. Think about your own experience. When you are seeking out a place to stay for a vacation on Expedia, Booking, or TripAdvisor, what do you do? I am willing to bet that you’d be scrolling down the screen to check on the reviews before you know it.
In other words, guest reviews clearly influence people’s booking decisions, which means, you’d better pay attention to what people are saying about your hotel. Not only you want to read the reviews, but analyze them in a way that can help you learn the most about your customers. The reviews can tell you if you are keeping up with your customer’s expectations, which is crucial for developing marketing strategies based on the personas of your customers.
Reviews are important and as hotel owners, you need to start leveraging it. In the following part, we will talk about how to do sentiment analysis on hotel reviews and how to scrape all review data into Excel.
What Is Sentiment Analysis
Sentiment analysis, also called opinion mining, is a text mining technique that could extract emotions of a given text – whether it is positive, negative or neutral, and return a sentiment score. This technique is usually used on reviews or social media text.
Scrape Hotel Reviews Using Octoparse
The web scraping tool, Octoparse, is a powerful web scraper that is built for people without coding backgrounds. It has two modes for extracting data, one is “Custom Task” and another is “Task Template”. For the “Custom Task”, users can build their own crawler to get the data they want. Also, it has the auto-detecting mode to help new bees to start easily and quickly. The “Task Template” are built-in crawlers that are ready to use without any task configuration. With the built-in crawlers, all you have to do is to choose a template that helps you to get the target data which needs to fill in some required parameters and let the scraper scrape the data for you. It is really good for the beginners who have no idea how to create a crawler to scrape the data they want. You can also find more advanced functions like cloud service, scheduled task, API, IP proxies, etc.
Scrape hotel reviews with Octoparse TripAdvisor template
Step 1: Collect all the hotel URLs that we want to scrape the data from and put them all in a document or anywhere that is easy to copy and paste later.
Step 2: Open Octoparse and enter the keyword TripAdvisor, you will see all the TripAdvisor scrapers we have. You can choose the “TripAdvisor Review” scraper. (You will see a short guideline explaining what this specific template does, how to use it, what kind of parameters you shall enter and what data you can get.)
Step 3: Click on the “Try it” option and paste all the hotel URLs that we prepared previously. Once we are done entering the parameters, click the “Save & Run” button to launch the scraper.
Step 4: Export all the extracted data to all kinds of formats that Octoparse provides, like Excel, CSV, JSON, and HTML. Alternatively, we can export the data to our database or data visualization tools via Octoparse APIs too.
Once we have successfully extracted all reviews for the hotel, we are ready to get the sentiment score for each review using Python. If you want to build your own crawler to customize the hotel review data fields, please move to Scrape customer reviews from Tripadvisor tutorial to learn more details.
Hotel Reviews Sentiment Analysis with Python
First, we’d import the libraries. Here we will use two libraries for this analysis.
The first one is called pandas, which is an open-source library providing easy-to-use data structures and analysis functions for Python.
The second one we’ll use is a powerful library in Python called NLTK. NLTK stands for Natural Language Toolkit, which is a commonly used NLP library with a lot of corpus, models, and algorithms.
Let’s go ahead and import the reviews scraped.
Here we have applied a function called SentimentIntensityAnalyzer() in nltk.sentiment.vader. The SentimentAnalyzer can implement and facilitate sentiment analysis tasks with NLTK algorithms and features, so the sentiment scores can be generated without complex coding. Before we use it, we need to call it.
Now we have called the function, apply it to generate the polarity scores. There are four types of scores: negative, neutral, positive and compound. By using apply() and lambda, we could transform the result and put them into the “reviews” data frame.
Then we have the sentiment score for each review.
Each review has a negative score, a neutral score, a positive score, and a compound score. The compound score is a comprehensive assessment of the first three scores. This score ranges from -1 to 1. Normally we will set a threshold of the compound score to identify the sentiment. Here we could set the threshold as ±0.2. If the compound score of a review is greater than 0.2, then the review is positive. If the compound score of a review is less than 0.2, then the review is negative. If the compound score is between -0.2 and 0.2, then the review is neutral.
As we can see, 97.2% of the reviews are positive and only 1.22% of the reviews are negative. Based on the result, it is safe to say that Hotel Giraffe by Library Hotel Collection is a well-liked hotel.
Of course, there’s so much more we could do to further analyze the reviews:
- Build a word cloud or topic modeling model to identify what are the key reasons people love this hotel.
- Compare the sentiment scores with other hotels by extracting the reviews from other hotels and analyzing with the above steps.
- Extract more information like review date, reviewer contribution, reviewer helpful vote, review helpful vote, the number of shares, etc, visualize them and apply business analysis approaches.
You now know how important reviews are to the success of your business. Why not head over to Octoparse and try it out yourself. It is an easy-to-use web scraper that could help you turn websites into structured data within clicks. Better yet, there are ready-to-use templates and lifetime free versions. Feel free to contact us if you need any help with your web scraping related project.