10 Machine Learning Algorithms You Should Know

11 min read

The word “Big data” prevailed in 2019, and it’s going to keep prevailing in the following years. In our previous post, I have introduced some concepts about big data, machine learning, and data mining (see post: Understanding Big Data, Data Mining, and Machine Learning in 5 Minutes). Now let’s dig deeper into Machine Learning with a brief walk-through of some most commonly used ML algorithms, no codes, no abstract theories, just pictures and some examples of how they are used. 

The list of algorithms covered in this article includes:

  • Decision tree
  • Random forest
  • Logistic regression
  • Support vector machine
  • Naive Bayes
  • k-NearestNeighbor
  • k-means
  • Adaboost
  • Neural network
  • Markov

1. Decision Tree

Classify a set of data into different groups using certain attributes, execute a test at each node, through branch judgment, further split the data into two distinct groups, and so on and so forth. Tests are done based on existing data, and when new data are added they can be classified into the corresponding group 

Classify data according to some features, whenever the process goes to the next step, there is a judging branch, and the judgment divides the data into two, and the process goes on. When tests are done with existing data, new data can be These questions are learned by the existing data, when there is new data coming in, the computer can categorize data into the right leaves.

decision tree

2. Random Forest

Select randomly from the original data, and form into different subsets.

random forest 1

Matrix S is the original data, and it contains 1-N data rows, while A, B, and C are the features, and the last C stands for categories.

random forest 2

 Create random subsets from S, let’s say we got M sets of subsets.

random forest 3

And we get M sets of decision trees from these subsets:

Throw new data into these trees, we can get M sets of results, and we count to see which results are the most in all M sets, we can consider that as the final result.

class prediction

3. Logistic Regression

When the probability of the predicting target is larger than 0 and less than or equal to 1, it cannot be fulfilled by a simple linear model. Because when the domain of definition is not within a certain level, the range would exceed the specified interval.

linear model plot

We better go with a model of this kind.

logistic regression

So how can we get this model?

This model needs to fulfill two conditions, “Larger than or equal to 0”, and “Less than or equal to 1”

logistic regression 2

When we transform the formula, we can get the logistic regressions model:

logistic regression 3

By calculating the original data, we can get corresponding coefficients.
And we get the logistic model plot.

logistic regression 4

4. Support Vector Machine

To separate the two classes from the hyperplane, the best choice will be the hyperplane that leaves the maximum margin for both classes. Because Z2>Z1, the green one is better.

support vector machine 1

Use a linear equation to express the hyperplane, class above the line is larger than or equal to 1, the other class is less than or equal to -1.

support vector machine 2

Calculate the distance between the point to the surface by using the equation in the graph:

support vector machine 3

So we get the expression of total margin as below, the aim is to maximize the margin, which we need to do is to minimize the denominator.

support vector machine 4

For example, we use 3 points to find the optimal hyperplane, define weight vector=(2, 3) – (1, 1)

support vector machine 5

And get weight vector (a, 2a), substitute these two points into the equation

support vector machine 6

When a is confirmed, the result using (a, 2a) is the support vector, and the Equation substituting in a and w0 is the support vector machine.

5. Naive Bayes

Here’s an example of NLP:

Giving out a piece of text, examine whether the text’s attitude is positive or negative.

naive bayes 1

To solve the problem, we can only look at some of the words:

naive bayes 2

And these words will represent only some of the words and their counts.

naive bayes 3

And the original question is: Give you a sentence, which category does it belong to? By using Bayes Rules, it is going to be an easy question.

naive bayes 4

The question becomes, in this class, what’s the probability of occurrence of this sentence? And remember not to forget the other two probabilities in the equation.

Example: the probability of occurrence of the word “love” is 0.1 in the positive class, and 0.001 in the negative class.

naive bayes 5

6. k-NearestNeighbor

When comes a new datum, which category has the most points nearest to it, it belongs to which category.

For example: To distinguish “dog” and “cat”, we judge from two features, “claws” and “sound”. Circles and triangles are the known categories, what about “star”:

k-nearestneighbor 1

When K=3, these three lines connect the nearest 3 points, and circles are more, so “star” belongs to “cat”.

k-nearestneighbor 2

7. k-means

Separate the data into 3 classes, the pink part is the biggest, while the yellow is the smallest.

Pick 3, 2, 1 as default, and calculate the distance between the rest data and the defaults, and classify it into the class that has the shortest distance.

k-means 1

After classification, calculate the means of each class, and set it as the new center.

k-means 2

After some rounds, we can stop when the class no longer changes.

k-means 3

8. Adaboost

Adaboost is one measure of boosting.

Boosting is to gather up the classifiers that didn’t have satisfactory results, and generate a classifier that may have a better effect.

As the below shows, tree 1 and tree 2 don’t have good effects individually, but if we input the same data, and sum up the results, the final result will be more convincing.

adaboost 1

An example for adaboost, in handwriting recognition, the panel can extract many features, such as the beginning direction, the distance between the beginning point and ending point, etc.

When training the machine, it will get the weight of each feature, like 2 and 3, the beginnings of writing them are very similar, so this feature does little to classification, so its weight is little.

adaboost 2

But this alpha angle has great recognizability, so the weight of this feature will be great. The final outcome will be a result of considering all of these features.

adaboost 3

9. Neural Network

In NN, an input may end up in at least two classes. A neural network is formed of neurons and connections of neurons. The first layer is the input layer, and the last layer is the output layer. The hidden layers and output layers, both have their own classifiers.

neural network 1

When input comes into the network, and is activated, the calculated score will be passed down to the next layer. Scores shown in the output layer are the scores for each class. The example below gets the result of class 1;

neural network 2

same input being passed to different knots generates different scores, which is because each knot has different weights and biases, and this is propagation.

10. Markov

Markov Chain consists of states and transitions. For example, get a Markov Chain based on “the quick brown fox jumps over the lazy dog”. First, we need to set every word under a state, and we need to calculate the probability of state transitions.

markov 1

These are the probabilities calculated by one single sentence. When you use massive data of texts to train the computer, you will get a bigger state transition matrix, such as words that can follow “the”, and their corresponding probabilities.

markov 2

Hot posts

Explore topics

Get web automation tips right into your inbox
Subscribe to get Octoparse monthly newsletters about web scraping solutions, product updates, etc.

Get started with Octoparse today


Related Articles