Blog > Web Scraping > Post

Visualizing Reviews on 2018 World Cup from Youtube

Tuesday, July 10, 2018

                

The 2018 World Cup has begun on June 14th. We are quite curious about what people are really interested in, and what they are talking about the matches. So last Friday, we scraped and analyzed the 612 video information under the search result of "World Cup 2018", and we got over six thousands comments.

 

 

Here we would like to share with you the data extraction process and what we’ve found based on the data extracted.

 

What Information We Captured?

With the assistance of Octoparse 7, we configured two crawlers for this project. The difficulty of scraping Youtube is the infinite scroll down of the results page and video pages, so we need to set up automatically scroll down in Octoparse, to obtain the video information as much as we can.

 

 

And we would like to capture Title, Duration, Publisher, Time, Views, Like and Dislike of the video.

 

 

With the Point & Click function of Octoparse 7, it’s quite easy to extract the video information we want.

 

 

 

As for retrieving the comments of each video, we built another crawler for it. The trick here is adding an extra loop item inside the crawler workflow to extract each comment one by one.

Within each comment, we just scraped the content and the "Like" number.

 

As a result, we could transform the video information and comments of Youtube videos into structured datasheet.

 

What Did We Found From the Information Extracted?

I employed the crawler to scrape the data last Friday, with only 56 games happened at that time. Following are the results I would like to share with you.

 

Ten Most Popular Videos about 2018 World Cup

It’s quite surprising that the music videos of the official songs are so popular. Among the 10 most popular videos, 4 of them are music videos of the official songs of the 2018 World Cup.

 

 

Ten Most Popular 2018 World Cup Matches on Youtube

Popularity is related with the playing team, instead of the matching stage.

We found out that there are 7 popular matches are group stage matches, with 3 of them are Last 16 matches.

The most popular team is Argentina (3 matches listed), the following are Portugal, Germany, Brazil (2 matches listed respectively).

 

People's Preference on the 10 Most Popular Matches

Now we would like to explore further people’s preference about the 10 Most popular matches, simply from the "Like" and "Dislike" numbers.

 

Ten Most Dislike 2018 World Cup Matches

This is calculated by the ratio of the number of "Dislike"/Total View number. It’s quite interesting that 6 of them are also listed in "10 Most Popular World Cup Matches".

 

 

Sentiment Analysis on 2018 World Cup Matches

After further process on the comment scraped, we got the analyzed result of the sentiment analysis of 8 2018 World Cup Matches, as below:

 

 

Words Surrounding 2018 World Cup Matches Comments

We imported the comments under several matches to create the word cloud, finding the most commonly occurring words.

 

Sweden vs Swaziland

 

South Korea vs Germany

                  

 

Spain VS Russia

                 

England vs Belgium

            

 

How People Commenting on the 2018 World Cup Final: France Vs Croatia

 

Word clouds of the comments on the 2018 World Cup Final: France vs Croatia

Words used most often in the comments

 

 

Conclusion

Data is beautiful. Thanks to the rapid development and popularization of data extraction and data analytics tools, now we can collect the information we want and employ an analysis, in a much faster and easier way.

 

Octoparse, Data at Your Fingertips!

 

Further Reading:

Top 10 Data Scraping Tools for 2018

Top 30 Big Data Tools for Data Analysis

Video: Build Up Your First Scraper Without Coding

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact us Download
btn_sidebar_use.png
btn_sidebar_form.png