undefined
Blog > Web Scraping > Post

Scraping & Visualizing YouTube Comments on World Cup

Tuesday, July 10, 2018

 The 2018 World Cup began on June 14th. We were quite curious about what people were really interested in, and what they were talking about the matches. So we scraped and analyzed the 612 video information under the search result of "World Cup 2018", and we got over six thousand comments.

 

Here we would like to share with you the data extraction process and what we’ve found based on the data extracted.

 

What Information We Captured?

With the assistance of Octoparse 7, we configured two crawlers for this project. The difficulty of scraping Youtube is the infinite scroll down of the results page and video pages, so we need to set up automatically scroll down in Octoparse, to obtain the video information as much as we can.

 

And we would like to capture Title, Duration, Publisher, Time, Views, Likes and Dislikes of the video.

 

With the Point & Click function of Octoparse 7, it’s quite easy to extract the video information we want.

 

As for retrieving the comments of each video, we built another crawler for it. The trick here is adding an extra loop item inside the crawler workflow to extract each comment one by one.

Within each comment, we just scraped the content and the "Like" number.

worldcup1

As a result, we could transform the video information and comments of Youtube videos into structured datasheet.

worldcup2

What Did We Found From the Information Extracted?

I employed the crawler to scrape the data last Friday, with only 56 games happened at that time. Following are the results I would like to share with you.

 

Ten Most Popular Videos about 2018 World Cup

It’s quite surprising that the music videos of the official songs are so popular. Among the 10 most popular videos, 4 of them are music videos of the official songs of the 2018 World Cup.

worldcup3

 

 

Ten Most Popular 2018 World Cup Matches on Youtube

Popularity is related with the playing team, instead of the matching stage.

We found out that there are 7 popular matches are group stage matches, with 3 of them are Last 16 matches.

The most popular team is Argentina (3 matches listed), the following are Portugal, Germany, Brazil (2 matches listed respectively).

worldcup5

 

People's Preference for the 10 Most Popular Matches

Now we would like to explore further people’s preference for the 10 Most popular matches, simply from the "Like" and "Dislike" numbers.

worldcup6

 

Ten Most Dislike 2018 World Cup Matches

This is calculated by the ratio of the number of "Dislike"/Total View number. It’s quite interesting that 6 of them are also listed in "10 Most Popular World Cup Matches".

worldcup7

 

Sentiment Analysis on 2018 World Cup Matches

After further process on the comment scraped, we got the analyzed result of the sentiment analysis of 8 2018 World Cup Matches, as below:

 worldcup8

 

Words Surrounding 2018 World Cup Matches Comments

We imported the comments under several matches to create the word cloud, finding the most commonly occurring words.

 

South Korea vs Germany

                  worldcup9

 

 

How People Commenting on the 2018 World Cup Final: France Vs Croatia

worldcup10

Word clouds of the comments on the 2018 World Cup Final: France vs Croatia

worldcup11

Words used most often in the comments

 

 

Conclusion

The data is beautiful. Thanks to the rapid development and popularization of data extraction and data analytics tools, now we can collect the information we want and employ an analysis, in a much faster and easier way.

 

 

Author: Surie M. (The Octoparse Team)

 

More Resources

 

Top 20 Web Scraping Tools to Scrape the Websites Quickly

Top 30 Big Data Tools for Data Analysis

Web Scraping Templates Take Away

How to Build a Web Crawler - A Guide for Beginners

Video: Create Your First Scraper with Octoparse 7.X

 

 

 

 

We use cookies to enhance your browsing experience. Read about how we use cookies and how you can control them by clicking cookie settings. If you continue to use this site, you consent to our use of cookies.
Accept decline