Written By: Deshani Geethika – Sysco LABS EAG R&D Team
Where does Regression come in Machine Learning?
The following chart describes the main categorization of ML
Types of Machine Learning 
If you are completely new to ML, supervised learning is where a labeled dataset is provided for the model in the training phase. So, according to this chart, regression comes under supervised learning.
In contrast to this, unsupervised learning is where a labeled dataset is not provided. A labeled dataset basically means that the model is aware which class(s) are to be predicted (in a classification problem) or the kind of values to be predicted (in a regression problem).
On the other hand, in the unsupervised approach, the model will decide which clusters are to be predicted by itself. What this means is that, regression is a technique used in predicting real numerical values.
Reinforcement learning is another main type of machine learning, where it continuously learns unlike in other approaches, using reward systems.
In this article, we focus on the different kinds of regression techniques under separate sections.
The simplest form of regression. There are the independent variable(s) (x) and the dependent variable (y). The dependent variable is the one that is predicted. Here the following assumptions are made.
- As the name denotes, independent variables are independent of each other.
- Independent variables are linearly-correlated to the dependent variable
If there is only one independent variable, it is known as a simple linear regression. In that case, y = m*x equation is formed resulting in a line.
Simple Linear Regression 
When multiple independent variables have come into the problem, it is known as a multiple linear regression problem, which results in a hyperplane.
Multiple Linear Regression resulting in a hyperplane 
In linear regression, what the model does is, finding coefficients of the equation. For example, consider the equation: y = b0 + b1*x1 + b2*x2 + b3*x3. Here the model will predict the values for b0, b1, b2, b3.
- Easy to use
- Useful when the relationship to be modeled is not extremely complex and smaller datasets
- Not suitable for non-linear or complex relationships
- Not suitable for huge data sets
Here the model assumes that the independent variables are polynomially correlated to the dependent variable. Some example polynomials are sin, cos, quadratic, etc. The main problem here, is the need to understand the correlation of data beforehand.
Polynomial Regression 
- Can model non-linear relationships
- The relationship between features and output variable should be known beforehand
Random Forest (Regression Trees)
Random forest is a forest (set of trees) of multiple decision trees. Random Forest is used in both classification and regression problems. Each decision tree in the forest represents a random subset of features from the original feature set. So, each data point goes through each decision tree and predict a value for the output.
The final result is considered as the average of all the results from trees. (In Random Forest Classifier, each decision tree predicts a class, and from a majority vote, the class is decided). Because of this, overfitting is unlikely to happen in random forests (Decision trees is explained a bit more, after the pros and cons).
Random Forest Regression 
- Suitable to capture complex non-linear relationships
- Not overfitting
- Larger forest may require larger memory
Decision trees are used in both Classification and Regression problems. In classification problems, the decision tree predicts a class, and in the regression problem, it predicts a real value. Simply put, a decision tree is a tree with a set of nodes connected as a tree, wherein each node is responsible for taking a decision. Based on the decision at each node, the final outcome may change.
Decision Tree 
Overfitting is a consequence that happened when the model is trying to capture the behavior of each and every data point in the training sample. This results in a complex model, which cannot capture the behavior of an unseen data point. In decision trees, this is highly likely to happen if the number of nodes is high. That means the model takes more decisions to describe the dataset. In Random Forest, since a subset of features is taken into consideration at each tree, it is highly unlikely of overfitting.