undefined
Blog > Big Data > Post

Visualizing the Progression of the Coronavirus Outbreak

Friday, February 14, 2020

A few days ago I published an article and analyzed the social impact of the coronavirus (COVID-19) in China. However, some people in general still lack a full understanding of this outbreak. I thought it’d be interesting to visualize the situation from a more objective perspective. 

 

How to start

First, I start with web scraping to extract the data from China’s National Health Commission and use Tableau to visualize the outbreak progression spatially. I also create a dashboard where we could easily toggle between the dates and provinces for a closer look.

 

Coronavirus Progression

By Ashley from Tableau Public

Disclaimer:

Please note that the data I’ve collected is up to February 11. As you read this article, the data may be off the mark and can’t reflect the current situation of this outbreak. I will explain there is an easy way to keep up with the live data later in the article. I used a web scraping tool to extract data instead of coding since it can transmit the data to a feasible format without data cleaning.

Choose a data source:

If you google coronavirus data, I’m sure you will find many resources. Sources like Kaggle and WHO are both secondary data collected by others which lag behind the latest data from the primary source like the Chinese official health website. If you are a data analyst who has strict standards regarding the accuracy and timeliness, you should avoid drawing conclusions with the secondary data. So what source should you use? Primary data is what you choose. At this point, I chose Coronavirus Update Source as it is saved as JSON, enabling us to stream the data for individual cities to our system through an API pipeline. (Read this guideline of a JSON file)

 
JSON scraping_OctoparseOctoparse web scraping JSON

 

Scraping Template

Another way to extract the live data is by using a scraping template as I did from the last article. It’s a cut and dry solution for people who can’t do coding (Watch this video to get details). You can set a task scheduler in order to get up-to-the-minute data. Here is the data I’ve collected and felt free to play with it.

 

Data Visualization with Tableau

After getting a sheer volume of data, we can upload it to Tableau. I first create a map layer by simply dragging the Province/State to the drop fields. After that, I add time-series and accumulate values to give a full look of the data trends over each province.  I draw out Hubei province as I can take special care of its data trends. The map shows a historical spread of Coronavirus over the last 20 days since January 22nd. As of February 11th, the number of confirmed infections in Hubei alone hit 33,366.

 

outbreak

Outbreak Progression Hubei VS. Others

 

We can tell that besides Hubei, this outbreak has a large impact on Guangdong, Zhejiang, Hunan and Henan as well.

 

Coronavirus Effect

Case Reported in each province

 

Notice the reported cases from Hubei are significantly greater than all the others combined. I create a group and divide them into two categories: Hubei and Others. To get a better idea where this outbreak leads to, I also add trend lines to analyze the current situation. And you can notice that both Hubei and others begin to slide underneath the trendline which indicates a tendency of declining in confirm cases. However, the death toll doesn’t show a positive change as the numbers are still above the trendline. 

  Coronavirus Trendline_Octoparse

Confirmed Cases of Hubei VS. Others

 

 Coronavirus Death Toll_Octoparse

Death Toll of Hubei VS. Others 

  

The recovery rate among provinces besides Hubei seems to be some cheerful news as the trendline is stiffer over time, and more places move upwards with an indication of inclining in the recovery. The recovery rate will continue to grow as people now are taking prompt actions to defeat the virus. 

 

Coronavirus Recovery _Octoparse                       

Recovery Rate Growth

 

Final thoughts:

I made animation since it is a great way to understand the big picture where we are able to see the progression of this outbreak. Once we visualize the data, it becomes much easier to analyze. The biggest challenge in data analysis is data collection.  I usually would invest most of the time on mindless labor work. Often, I also need to repair the data format manually. I found that a web scraping tool can greatly elevate them productively. However, I wouldn’t recommend abusing and scraping any website excessively. This would lead to serious legal consequences. Check out this article for more information: Is web crawling legal?

 

I will work to improve the visualization and feel free to share your thoughts and email me. 

 

Author: Ashley

Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction

 

日本語記事:Tableauで新型コロナウイルスの感染状況を可視化してみた
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Visualizando la progresión del brote de coronavirus
También puede leer artículos de web scraping en el Website Oficial

 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download