Understanding Big Data, Data Mining, and Machine Learning in 5 Minutes

5 min read

What can data mining and big data do?

In short, they empower us with the ability of forecasting. Machine learning makes this forecasting easier and more accurate.

1. Our lives have been digitalized

Today, each of the many things we do everyday can literally be recorded. Every credit card transaction is digitalized and traceable; our public presence is consistently being monitored by the many CCTV’s hanging around every corner of the city; for businesses, the majority of the financial and operating data are saved in some kinds of ERP’s; And with the rise of wearable devices, every heartbeat and breath is being digitalized and saved into usable data. Just when so much of our lives are being digitalized, a computer can now “understand” our world better than ever before. 

2. If the pattern remains unchanged, then the past = future

Many of the different things in our lives show patterns. For example, a person is likely to travel between work and home on any working days and either go on a vacation or watch a movie on any non-working days, and this pattern is unlikely to change. Stores will have their peak hours and slacking times of any single day and this pattern is unlikely to change. Businesses will demand higher labor input in certain months of the year and this pattern is unlikely to change.  

Summing up point 1 and point 2, we can conclude that it is very possible for a computer to predict the future given if the patterns in the past are provided as these patterns are most likely consistent over a prolonged period of time. 

If a computer can predict people’s lifestyles, it will know exactly when is the best time to fit in a promotion, such as a promotion for a car wash if this person tends to get a car wash on every Friday of the week, or a coupon for a hotel stay if this person tends to go on a vacation on March of every year.  Businesswise, a computer can also predict a store’s sales forecast throughout the day then build the business strategy to maximize total revenue. For enterprises, a computer can also design the best operational plan consisting of the most reasonable work force arrangement. 

As soon as the future becomes predictable, we can always plan ahead and prepare for the best move possible. Just like Neo in “The Matrix”, he’s able to dodge all the bullets because he can see where the bullets’ coming from clearly. According to Sherlock Holmes, “an advanced grasp of the mathematics of probability, mapped onto a thorough apprehension of human psychology, and the known dispositions of any given individual can reduce the number of variables considerably”, in another word, “big data gives us the power to predict the future”. This is the power of data mining. Data mining is consistently tied to Big Data simply because Big Data enables massive datasets, thus providing the base to all predictions. 

So, what exactly are Big Data, Machine Learning, and Data Mining?

Big Data

When the amount of data is tremendous, it is obvious this data cannot be dealt with on any single machine. An extremely large file, let’s say 10GB, chances are you won’t be able to open it in any Windows system before it crashes down the whole thing. Big data has been developed for this exact purpose. You can think of it as a special software, which splits a big file into much smaller ones, which can then be processed on numerous machines. The process of dividing and combing the data pieces is known as MapReduce. The software framework most commonly used for this process is called Hadoop. Hadoop solves the basic problem, and there are a bunch of tools to be used along with Hadoop such as Pig, Zookeeper and Hive to make the process even easier. Hadoop together with its many associated tools, is generally referred to as the “Big Data Technology”.

Machine Learning

We just had to touch base on how a piece of data can be processed. Assuming this piece of data contains a group of shoppers’ purchasing behaviors, including the total number of items purchased and the number of items purchased by each shopper. This is so far a simple statistical analysis.  However, if our goal was to analyze the correlation between the different types of shoppers, or if we want to extrapolate the specific preference of a specific type of shoppers, or even to predict any shopper’s gender or age, we’ll need a much more complicated model, which we called Algorithm. Machine Learning can be more easily understood as all different kinds of algorithms developed for data mining purposes, such as logistic regression, decision tree, collaborative filtering, and much more.

Data Mining

Through the application of machine learning algorithms, existing data can be utilized to predict for the unknowns, and this is exactly why the wonders of Data Mining are closely connected to Machine Learning. Nevertheless, the strength of any machine learning algorithm depends heavily on the supply of massive datasets. Keep in mind that regardless of how sophisticated an algorithm is, no inspirational prediction can be made from a few lines of data. Big data technology is the premise of machine learning, and with the use of machine learning, we are able to gain valuable insights from existing datasets, and this is data mining.

Hot posts

Explore topics

Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today


Related Articles