logo
languageENdown
menu

Scraping & Visualizing YouTube Comments on World Cup

3 min read

The 2018 World Cup began on June 14th. We were quite curious about what people were really interested in, and what they were talking about during the matches. So we scraped and analyzed the 612 video information under the search result of “World Cup 2018”, and we got over six thousand comments.

Here we would like to share with you the data extraction process and what we’ve found based on the data extracted.

What Information We Captured

With the assistance of Octoparse 7, we configured two crawlers for this project. The difficulty of scraping Youtube is the infinite scroll down of the results page and video pages, so we need to set up automatic scroll down in Octoparse, to obtain the video information as much as we can.

And we would like to capture Title, Duration, Publisher, Time, Views, Likes, and Dislikes of the video.

With the Point & Click function of Octoparse 7, it’s quite easy to extract the video information we want.

As for retrieving the comments of each video, we built another crawler for it. The trick here is adding an extra loop item inside the crawler workflow to extract each comment one by one.

Within each comment, we just scraped the content and the “Like” number.

comment example

As a result, we could transform the video information and comments of Youtube videos into a structured datasheet.

scraped comment data example

What Did We Found From the Information Extracted

I employed the crawler to scrape the data last Friday, with only 56 games happening at that time. The following are the results I would like to share with you.

Ten Most Popular Videos about the 2018 World Cup

It’s quite surprising that the music videos of the official songs are so popular. Among the 10 most popular videos, 4 of them are music videos of the official songs of the 2018 World Cup.

most popular videos about world cup 2018

Ten Most Popular 2018 World Cup Matches on Youtube

Popularity is related to the playing team, instead of the matching stage.

We found out that there are 7 popular matches are group stage matches, with 3 of them are Last 16 matches.

The most popular team is Argentina (3 matches listed), and the following are Portugal, Germany, and Brazil (2 matches listed respectively).

most popular matches

People’s Preference for the 10 Most Popular Matches

Now we would like to explore further people’s preference for the 10 Most popular matches, simply from the “Like” and “Dislike” numbers.

likes and dislikes top 10 matches

Ten Most Dislike 2018 World Cup Matches

This is calculated by the ratio of the number of “Dislike”/Total View numbers. It’s quite interesting that 6 of them are also listed in “10 Most Popular World Cup Matches”.

top dislike world cup matches 2018

Sentiment Analysis on 2018 World Cup Matches

After further processing on the comment scraped, we got the analyzed result of the sentiment analysis of 8 2018 World Cup Matches, as below:

sentiment analysis on world cup matches

Words Surrounding 2018 World Cup Matches Comments

We imported the comments under several matches to create the word cloud, finding the most commonly occurring words.

South Korea vs Germany

word analysis 2018 World Cup

 How People Commenting on the 2018 World Cup Final: France Vs Croatia

2018 World Cup final

Word clouds of the comments on the 2018 World Cup Final: France vs Croatia

2018 World Cup word clouds analysis

Words used most often in the comments

Conclusion

The data is beautiful. Thanks to the rapid development and popularization of data extraction and data analytics tools, now we can collect the information we want and employ an analysis, in a much faster and easier way.

Hot posts

Explore topics

image
Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletter about web scraping solutions, product updates, etc.

Get started with Octoparse today

Download

Related Articles