Scrape Reviews on Booking.comThursday, January 21, 2021
We will be looking at how to scrape reviews data from booking.com and this same method can be applied to scrape data from other tourist sites and even to gather data from any websites.
As an example, we will talk about the key points to scrape customer reviews about the hotels in Tbilisi City from booking.com with Octoparse (The guide with Octoparse 7.X is below.) This is a basic guide and if you want to learn how to make such a scraping task or want to get other types of customer reviews from booking.com, we offer the extraction services for you to suit the needs. Please contact us via firstname.lastname@example.org.
We’ve made the scraping task and you can directly download the .otd file (What is an OTD. file?) to begin to collect the hotel reviews from booking.com. (Download my extraction task of this article HERE just in case you need it.)
The OTD. file is available only in Octoparse. You can Download Octoparse before downloading the scraping task.
Please click HERE to open the website URL we used.
The data fields include hotel name, hotel address, star rating, customer name and comments posted by the customer.
We will go to the detail page of each hotel and get the reviews under the “Read all trusted reviews” tab.
Since sometimes the actual number of reviews are more than what is shown on the detail page, we will need to get all the reviews from all the countries displayed. Therefore, we clicked the plus button to display all the countries in which the consumer was located.
In Octoparse, we will create a list of items to extract all the countries. The Xpath for the loop will extract extra elements from the web page so we need to modify the XPath and let the XPath expression to select the elements correctly.
Since we all know that all the elements will be extracted by clicking the elements when you create a list of items in Octoaprse, and the booking.com website will select the first country in the pop-up window by default, thus the first country will therefore be unselected when you create a loop for these countries.
In this case, we need to select the first country by clicking the checkbox of the first country and Octoparse will generate a “Click Item” action in the rule.
All the customer reviews about the hotel will be extracted by countries.
Since there are anonymous customer accounts and reviews, so the extraction output will have duplicate data records. You can export the data by choosing only the valid data.