How to Turn Raw Data into InformationTuesday, October 01, 2019
No single firm can survive without data. However, the fact is that we are inundated with data. Turning data into valuable information stays at the core value for business continuing to grow. Business Intelligence comes into place as it converts raw data into actionable insights. How does BI work? I will illustrate with a concrete example from data extraction to data interpretation.
But before we dive into it, we need to clarify the difference between raw data and information so we have a clear picture of why we are doing it.
Raw data is also called primary data. It is collected from one source that needs to be processed to make insightful. In business, data can be anywhere. It could be external: data that flows around the web like images, Instagram posts, Facebook followers, comments, competitors’ followers. Or it could also be internal: data from a business operation system like Content Management System (CMS).
Information is processed and organized data that is presented in forms of patterns and trends for better decision making. I would like to quote from Martin Doyle, “Computers need data. Humans need information. Data is a building block. Information gives meaning and context.”
Now you have a general idea of the start and the end, Let’s break down the middle part in details.
Step One: Data Extraction
This is a primary step to make the following possible. It is a zero-to-one process in which we need to start building up blocks. Data extraction is to retrieve the data from multiple resources on the internet. At this stage, The whole process takes place in three steps:
1. Identify the source: before we start extraction, it is necessary to make sure whether the source is legal to extract, and the quality of that data. You can check Terms of Service (ToS) to get detail information. For example, we all know that LinkedIn has tons of values in regard to sales prospects. Linkedin thought that too, so they disallow any form of scraping technique to access their websites.
2. Start extraction: Once you have the source, you can start extraction. There are many ways you can extract data. The primary method in nowadays is web scraping. An automated web scraping tool comes in handy as it eliminates the need to write scripts on your own or hire developers. It is the most sustainable and approachable solution for business with a limited budget yet great demand in data.
If I name a few best web scraping tools, Octoparse stands out as the most intuitive and user-friendly web scraping tool among others. It offers over 30 web scraping templates. They are built-in crawlers for you ready to use without any task configuration. If you don’t know to code, I recommend you to try it out first before jumping to the next option too soon.
Let’s use a quick example. Let’s say I launch a video campaign with a Twitter Influencer. I need to monitor the marketing effect. I examine Twitter, it is legal to scrape the public information.
1. Open Octoparse and navigate to “Template Mode”
2. Find the “Social Media” tab and click the “Twitter” icon.
3. Once you decide to use the template, click “Use Template” to proceed and type in the keyword “Cheetos” into the parameter box.
4. Next click the “Save and Run” button to execute your crawler. Then choose a “local extraction”. “Cloud Extraction” is to run the crawler in the cloud without tax the local resource. It normally works on large-scale data requirements without putting a strain on your computer.
Here is a sample output, with Octoparse Twitter Scraping Template, I scraped 5k lines of data including the user ID, handles, contents, video URLs, image URLs. Once you get the data, you can output into the desired format or connect to the analysis platform like PowerBI or Tableau via API. I am not an expert in PowerBI so I borrow the Report from Microsoft to make my point. The idea is that you can monitor social media by scheduling Twitter extraction (Instagram and Facebook) every day and connect to a preferred analysis platform via API.
Step Two: Analyze
In the analysis stage, it is necessary to check the accuracy as the quality of the data may affect the analysis result directly. During this stage, data will be transmitted to the users in variously reported formats like visualization and dashboard. Let’s use Power BI to monitor and analyze social media platforms so as to test my marketing strategy, product's quality and control crisis.
This is a sample of how to use POWER BI to present the data into information.
Present by @myersmiguel
Or you can analyze which post gain the most likes.
Okay, now you have the visual dashboard, how can you analyze?
In business, information needs to be interpreted pertaining to the context of the organization. Context is the major component to make actionable insights. Let’s take Cheetos as an example. You can see the trend line of mentions in April is 50k. The number of mentioned Cheetos in Twitter gain 50k mentions in April It is a huge number but it doesn’t tell anything valuable besides the volume of mentions at that moment.
What if I pull out all the number of mentions from the previous months and compare them with those from this month?
April: 50k mentions
March: 40k mentions
February: 1673 mentions
Now that we provide the context this is how I interpret the information:
Cheetos gains 10k mentions from March to April.
Cheetos gains 38k mentions from Feb to March.
We can tell Cheetos almost got popular overnight. And the word of mouth marketing is successful. To conclude, we should continue this strategy and let everybody get immersed with Cheetos.
The idea is that data by itself is meaningless. Most businesses share similar types of data but vary in scales. However, the information they acquire is different. The process of transformation incorporates the context of that organization.
Last but not least: Storage.
Data storage is the key component as businesses rely on it to preserve the data. Data Storage varies in capacity and level of speed. With the profusion of big data, storage vendors spring up and penetrate the market. This is a list of reliable vendors for your preference. For web scraping tools like Octoparse, it provides a suite of functions from data extraction from dynamic websites, data cleaning with built-in Regex, data exports into structured formats and cloud storage. The best part, it can connect to your local database with fewer efforts. Don’t get drown in the data, you need a ride with an intelligent tool.
Ashley is a data enthusiast and passionate blogger with hands-on experience in web scraping. She focuses on capturing web data and analyzing in a way that empowers companies and businesses with actionable insights. Read her blog here to discover practical tips and applications on web data extraction