We will be looking at how to scrape review data from booking.com and this same method can be used to scrape data from other tourist sites and even to gather data from any websites.
As an example, we will talk about the key points to scrape customer reviews about the hotels in Tbilisi City from booking.com with Octoparse (The guide with Octoparse 7.X is below.) This is a basic guide and if you want to learn how to do such a scraping task or want to get other types of customer reviews from booking.com, we offer extraction services for you to suit the needs. Please contact us via email@example.com.
This is the Octoparse support team.
This tutorial is out-of-date now :'( and we have a brand new step-by-step guide here : ) Scrape Hotel Data from Booking.com (click here to take off), and Video Guide to Scrape Booking.com is available as well. These upgraded guides make it easier for you to get data from booking.com. Don’t hesitate to try!
We’ve made the scraping task and you can directly download the .otd file to begin collecting hotel reviews from booking.com. (Download my extraction task from this article HERE just in case you need it.)
The OTD. file is only available in Octoparse. You can Download Octoparse before downloading the scraping task.
Please click HERE to open the website URL we used.
The data fields include hotel name, hotel address, star rating, customer name and comments posted by the customer.
We will go to the detail page of each hotel and get the reviews under the “Read all trusted reviews” tab.
Since sometimes the actual number of reviews is more than what is shown on the detail page, we will need to get all the reviews from all the countries displayed. Therefore, we clicked the plus button to display all the countries in which the consumer was located.
In Octoparse, we will create a list of items to extract from all the countries. The Xpath for the loop will extract extra elements from the web page so we need to modify the XPath and let the XPath expression to select the elements correctly.
Since we all know that all the elements will be extracted by clicking the elements when you create a list of items in Octoaprse, and the booking.com website will select the first country in the pop-up window by default, thus the first country will therefore be unselected when you create a loop for these countries.
In this case, we need to select the first country by clicking on the checkbox of the first country, and Octoparse will generate a “Click Item” action in the rule. All the customer reviews about the hotel will be extracted by countries.
Since there are anonymous customer accounts and reviews, the extraction output will have duplicate data records. You can export the data by choosing only valid data.