Tripadvisor Scraper: How to use web scraping to get travel dataMonday, September 13, 2021
Travel rules are currently changing with the Covid case curve. With the disease’s Delta variant, the cases are rising. As I am compiling this article, the EU is considering reimposing travel restrictions on U.S. visitors.
Anyway, always get prepared for a refreshing trip. I have built my Tripadvisor scraper with Octoparse and crawled down the information of destinations that are open to U.S. citizens. Web scraping is definitely the best way to help us pull down the web data and so we can sift through it and get the most value out of it. I will be showing how it helps me get the travel data.
Note: If you are setting out to these countries, you may want to check if vaccination or quarantine is needed.
Web Scraping Travel Data
Do you have any idea about big data in tourism?
Business guys in the travel industry are tracking all kinds of data, for example, business data of travel agents and visitors’ behavioral data on all travel-related platforms. They may know your traveling habits better than you. The whole industry is leveraging big data to launch the right product, find the right people to pay for their services.
Web scraping is the tech that makes this possible.
Well as a traveler, I want web scraping travel data to serve my needs - find destinations among the most attractive and get the guides from Tripadvisor for my reference.
Where Can an American Go
So, where can an American go for travel now?
Geo Map generated by mapchart.net
This article by CNN listed the destinations that are open to the U.S.(the list might be updating now and then).
What I wanted to do is to pull all the country names on this web page down into a spreadsheet so I can paste them into Octoparse to get more specific data from Tripadvisor.
Octoparse: How to get list information on a web page into excel
Octoparse can easily get list information on a web page into excel or CSV.
This is extremely helpful when you want to get a list of URLs or a list of data, which you want to paste and search on another platform, or import into a data analytics software for analysis.
Now that I have got the text list of destinations, I am going to build a TripAdvisor scraper to get specific data about these places.
Build a TripAdvisor Scraper
The data I am going to crawl from Tripadvisor:
- I want to check the travel popularity of these countries. I will consult with the number of reviews about the country on Tripadvisor. (My hypothesis: more visits, more reviews.)
- I have my travel theme. I am a nature lover interested in outdoor events and nature sightseeing. I will get the tag information of these destinations so that I can filter through and niche down to the perfect place where I can chase the wind, play on the beach or appreciate the grandeur of a peak.
- I will save the URL of travel guides on Tripadvisor for further travel planning. (Thanks contributors!)
Batch Generate URLs with Country Names
Where to get this data? This is a sample page: Tripadvisor Nepal.
With the list of country names I have scraped in the previous step, I can batch generate all Tripadvisor country pages with Octoparse.
Octoparse: Batch generating URLs with a parameter
Examples of pages generated:
Now that I have a list of target web pages to scrape data from, I am going to build a scraper that understands what data I am asking for and will grab it for me.
Create a Scraper: Tell Me What You Want
Building a scraper is like compiling a letter to converse with the computer - tell it where and how to get the data you want. Only you don’t speak in human language but programming languages.
And a web scraping tool is like a translator. It enables you to compile the letter using human language, thanks to the comprehensible workflow and intuitive UI.
If this is still abstract, never mind. Let’s dive right in with a few questions.
What a scraper can do?
- Visit - Open a web page.
- Click - Click a link on the web page.
- Extract - Crawl down data like texts, URLs, numbers, etc.
What data do I need?
- The country name, the number of reviews.
- The travel guide link, the title of the guide, and the tags of it.
How a scraper shall act to get the data I need?
- Visit the web page
- Extract the country name and number of reviews on the page
- Find the travel guide link and click it
- Extract the page URL, the title of the guide, tags of the guide
- Go back and visit the next web page
- Repeat the above steps (In Octoparse, this can be done with a loop)
Bingo. That’s the workflow I built here.
Octoparse: How a web scraper's workflow works
How to build the workflow?
- Enter the URLs into the search bar and start a building task. (Tell the scraper which web pages to visit)
- Click the data you want on the built-in browser. (Help the scraper locate the data)
- Select the actions you want the scraper to take on the Tips Panel. (Tell the scraper to visit, to click, or to extract data)
How does the data look like?
It is a long table as there are over 100 lines of data on my list. The screenshot below has done its best.
Sample TripAdvisor data scraped by Octoparse
I know, raw data is not pretty before any visualization but it is helpful. With this data, I found the best choice for a foodie plus beach lover - Spain!
I am going to study the Spain travel guides now. Have fun with Octoparse. Any problems using it, feel free to contact us at firstname.lastname@example.org.