Blog > Knowledge > Post

How to Become a Data Journalist

Thursday, April 23, 2020

In the past when information was scarce, the role of a journalist was as simple as collecting and disseminating the news. As of today, the flood of online data raises the threshold for journalists to sort through information. More often than not, they need to equip themselves with technical skills to be able to see through things and deliver newsworthy information to the public.


The question is, how to find the right data, and bring sense out of diasporic information to the general public as a competent data journalist. 


Table of contents


How to become a data journalist

What does a data journalist do?

A data journalist is someone who uses statistics to facilitate the writing and reporting of news stories to provide insights based on relevant data. When searched in LinkedIn, some highlighted skills include web crawling, data visualization, analytical skills, data analytics, data mining, and so on. 


How to break into data journalism

No matter what your goal is, to succeed in this field, just like in any other field, the most essential part is always hard work, perseverance, and day-by-day practice.  


For many, the first roadblock would be “I don’t have a journalism degree” or “I never wrote much before” or “I do write but I suck at numbers”. Well, if any of these is your case, don’t get frustrated and give up yet. Because you don’t necessarily need a journalism or technology background to start.


If you have technical skills ready to obtain and process online data, that’s awesome. However, if you are still a student looking for a bachelor’s/master’s degree, there are many universities offering programs for passionate people to pursue a career in data journalism


But what matters more is that you are someone patient enough to work with a large amount of information. It doesn’t mean that you need a solid high-tech background or experience such as a data analyst or data scientist. Most companies use handy data-related tools to facilitate data analysis, such as data extraction tools like Octoparse, data mining tools like Oracle Data Mining, data visualization tools like Tableau and Power BI, and that is just to name a few. None of them requires a high level of expertise, which saves people a large amount of time and energy to learn data analysis from scratch. 


Excel is great. And learning some programming languages will surely help

In 90% of jobs, knowing how to use Excel is one of the fundamental data analysis skills. But skills like JavaScript, CSS, HTML, as well as Python and Ruby web frameworks, are also in high demand. Some other programming languages that should be on a data journalist’s to-learn list are R/SAS/SPSS. When used well, these languages can help you extract and analyze data in a flexible way. 


But here’s a catch: if you are only trying to extract data from websites, there’s no need to code

It is often the case that data journalists need a large amount of clean and structured data to analyze and visualize. The data could be texts, image or web page URLs, numbers, tables, so on and so forth. To pull data from different data sources, journalists can take advantage of ready-to-use web scraping software. Because websites are HTML codes in nature, it is fairly easy to fetch data from any websites with them. The tools can convert data into structured formats automatically, such as Excel, JSON, CSV, and more. 


Example: Extract the latest Instagram posts with a web scraping tool

Take a web scraping tool Octoparse as an example. Let’s say we are trying to extract all Instagram posts in the past 5 hours related to the keyword “data”. 

instagram tag post data

Posts data extracted from Instagram 


We can use an Instagram web scraping template called “Tag post data” to scrape all the post URLs, image URLs, post content, hashtags contained, user IDs, etc.    


instagram web scraping

Instagram web scraping templates


Besides Instagram, Octoparse is capable of scraping Amazon, Google Search, Reuters, Indeed, Booking and many other websites. This video shows how to leverage it to extract data from any website efficiently. After data is collected, you can further process and analyze it to generate valuable insights. In this example, you can mine the posts data to see which words appear most frequently, find out the positive/neutral/negative sentiments hidden behind the posts, or the number of posts published during a specific period of time, etc.  


You can schedule your time to build your knowledge base

It is all about self-teaching. If you are not working right now and have plenty of time to switch into a new career, you can try studying for 8 hours every day. If you get off work at around 6 pm, consider studying between 8 pm and 11 pm on weekdays and study more on weekends. There are numerous free/paid online resources for you to make full use of.  


      • It may sound super basic but for a quick start, simply watching relevant videos on YouTube will provide you tons of value. 
      • Visit the official site of The Journalism and Media Studies Centre (JMSC) of The University of Hong Kong to get the latest trend of data journalism. 
      • Attend online courses on Coursera: This may take up to 80% of your learning process. You’ll find data journalism related courses from top universities like MIT, UMich, and UIUC for training on statistics, data science, data visualization, programming and so on. 
      • Check out edX as another source of online courses in programming, data analysis and statistics. They are provided in a variety of languages including English, Spanish, Chinese, Russian, French, and German.
      • If you are a fan of Google like me, you will find Google News Initiative pretty useful. On this site, they offer free online training materials for journalists, focusing on using Google tools in data journalism. 

data journalist courses



Above are just a few useful online sources for you to start exploring the new world. As data journalism develops over time, you can always google and try out the most up-to-date and reliable sources. Remember to check out the reviews before learning to make sure they are worth your time.  


Another thing to keep in mind is that “Practice makes perfect”. Intensive reading and writing are inevitable in your daily practice. You will never become a good data journalist if you can’t become a good journalist in the first place. Just thinking about “I want to report this story in this way” is far from enough. You need to roll up your sleeves, open a blank word doc or grab a pen and a piece of paper and let your words flow. My personal favorite writing handbooks (and a lot of people’s) are On Writing Well by William Zinsser and The Elements of Style by William Strunk Jr. and E. B. White. If you are a new writer, this is a good place to start.


Why do journalists learn to deal with data? 

The words of Rebecca Borisona, a reporter at The Street best speak out the heart of data journalists: 


“I firmly believe that data makes a story better and stronger in every case. It is one thing to collect a bunch of anecdotes about a trend you think is happening, but data is a gift to any contentious story. While a source may lie to you, or spin the facts, a careful look at the data will never lead you astray. There’s power in that.”


We are in a time when both rumors and facts get widespread within minutes. Data, when used and analyzed properly, can help clarify rumors and reveal facts. 


Just like a data journalist Ashley did during the 2019 Coronavirus outbreak. She scraped and visualized the live & death stats of COVID-19 from the Chinese government’s database, and then calculated the mortality rate of the Coronavirus. The results turned out to be consistent with the official statistics of WHO, CDC, ECDE, NHC and DXY. 

COVID-19 progression

Published by Ashley on dataextraction.io

Some last advice

As you are building your knowledge base and equipping yourself with all the skills, you can reach out to online data journalist’s communities to seek people with the same interest as you. Besides Linkedin, you can consider other great platforms for finding a job and networking with professionals including Indeed, Mediabistro, Glassdoor, and more. To become a successful data journalist is not a piece of cake, but it is neither rocket science. Where there is a will, there is a way. As long as you are devoted to it and are passionate to find solutions for all the problems you encounter, you’ll reach your goal in the end. 



 Author: Milly 

Download Octoparse to start web scraping or contact us for any
question about web scraping!

Contact Us Download