Menu

Non-technical intro to Random Forest and Gradient Boosting in machine learning

A collective wisdom of many is likely more accurate than any one. Wisdom of the crowd – Aristotle, 300BC-

The concept of Ensemble is fundamental to many areas of our lives. A choir  of singers is an ensemble. A band of instrumentalists is an ensemble. A group of vocalists singing different notes (Bass, alto, tenor, sopranos) is an ensemble. A group of kids singing melodious acapella is an ensemble . You can already see the trend right?

In all these examples above, individually, each one of them may be a good performer, however, when they perform together, they can render an exceptionally beautiful performance.

This is the same concept of Ensemble modeling in machine learning where we pool or chain a collection of learning models together to build a generalized, robust model that performs significantly well for prediction when new data is provided.

A model is any of the supervised or unsupervised algorithms listed below each of which have been well-proven to have excellent performance in predictive modeling. However, when a collection of the same (or different) model is aggregated together, we can get an excellent, more accurate predictive performance.

  • Supervised Learning: Classification (or label) of the sample dataset are well-known and predetermined.
  • Unsupervised Learning:  We have little to no prior knowledge of the results or data grouping. We need to check the relationship in the data to determine appropriate clustering.
CLICK to see types of Machine Learning Algorithms
Supervised Learning
("Y" is Known)
Un-supervised Learning
("Y" is Unknown)
Semi-Supervised Learning
(Sometimes we know "Y")
Regression
(Lasso, Ridge, Logistic)
Clustering
(K-means clustering, Mean shift clustering, Spectral Clustering)
Prediction & Classifications
Decision Tree
(Gradient Boosting & Random Forest)
Apriori RuleClustering
Neural NetworkKernel Density EstimationExpectation-Maximization (EM)
Support Vector Machine (SVM)Principal Component Analysis (PCA)
(Kernel PCA, Sparse PCA)
Transductive Support Vector Machines (TSVM)
Naive BayesSingular Value Decomposition (SVD)Manifold Regularization
K-Nearest Neighbor (KNN)Self-ORganizing Machine (SOM)Auto-encoder
(Multilayer Perceptron, Restricted Boltzmann machine (RBM))

Single Model

underfit_overfit

Single model representing the trends in a dataset. Linear model underfits. It sacrifices accuracy. More complex high dimensional model may overfit. It memorizes the data instead of learning the trends in the data.

Ensemble Model

In ensemble modeling, we use small sample of the whole data to ‘train’ (or fit) different models or algorithms. Each model produces an outcome. We can then use an averaging method or majority voting to determine the final prediction on outcome.

In general, Ensemble model is all about using many different models in collaboration to combine their strengths and compensate for their weaknesses as well as to make the resulting model generalize better to future data.

Ensemble models have significant application in several business domains. From movie recommendation engine, aircraft maintenance forecasting, business/sales forecasting, election forecasting using polls from various markets and demographics to weather forecasting. Ensemble modeling is undoubtedly a fantastic way to ensure better accuracy and reliability of prediction when machine learning is applied in several application domains.

READ NEXT:  How Young African Generation Can Achieve Success Through Courage

How do we combine RESULTS of multiple models?

Here we are concerned about how we treat the output coming from the multiple models. If we’re using 7 different models and each of the generate an output, how do we make the final prediction based on output from these 7 models. Remember, a model may be any of the supervised or unsupervised algorithm listed above.

  • Interval Targets We find the average of the outputs coming from each of the different models. Perfect for numeric and continuous data.
  • Categorical Targets We simply count all results from each of the models and use results with majority votes as final prediction. Great for result that gives True/False, Yes or No

How do we combine PROCESSES of multiple models?

Here we discuss the technique that we can use to connect multiple models together. It explains the connection process that occur before each of the model generates a result. There are four ways we can effectively join multiple models together:

  1. One Algorithm, (trained with) Different Data Samples
  2. One Algorithm, (tuned with) Different Configuration Options
  3. Different Algorithm (Chaining)
  4. Expert (Rule-Based Systems) knowledge (Joining with Heuristic ideas)

1.) One Algorithm, (trained with) Different Data Samples

  • Bagging (Bootstrap Aggregating)

This method is formally referred to as Bagging (Bootstrap Aggregating) where random samples from the available full dataset are divided into different subsets which are then fed into a model/algorithm, the result of which is fed into an Ensembler. The two parallel blocks in the middle are the processes which the last block, Ensembler is the result combiner.

READ NEXT:  Top 10 smoking hot technology trends to watch in 2015

bagging bootstrap aggregating

Notice that in this example, we have used a Decision Tree which is one of the models/algorithm listed above. However, any of the other methods could as well have been used. Random Forest is a particularly popular high-performance algorithms built around the concept of Bagging.

Random Forest: Random forest is an ensemble of decision trees where training (sample) dataset are recursively partitioned into sections along different branches such that similar observations are grouped together at the terminal leaves of the decision tree.

To provide more robustness, an ensemble / collection of decision trees can be trained to become a Forest where new observations are scored by majority vote or averaging. Each Decision Tree is trained from different random samples from the full dataset.

decision trees

A Decision Tree

Ensemble of Decision Trees

An Ensemble of Decision Trees

Each Decision Tree is trained on a random sample of the full dataset

Each Decision Tree is trained on a random sample of the full dataset

Random Forest is not just about bagging, it has some specific enhancements:

For example, Decision Trees in Random Forest are well-trained. This means, averaging a large number of trees will result in more robust prediction than using a single highly fine-tuned decision tree.

Also, Random Forest can be used to determine the variables (or features) with most significant impact. Hence, RF can help in pruning the (not-so-influential) variables if there’s a need for such reduction.

  • Boosting

Similar to Bagging, Boosting uses a single algorithm to train model on different data samples, however, with a different twist. Boosting start by training a single model on a sample of the dataset.

Those data points that are misclassified are assigned heavier weights in the next iteration so that focus is on the misclassified points. This helps to create an adaptive learning approach which improves the performance of the algorithm.

READ NEXT:  Business Plan vs Business Model - Which Way To Go?

Obviously, this technique is an expensive operation due to overhead of sequential training. Boosting can be used with any model, not just decision trees. However, an example of Boosting which uses Decision Trees is Gradient Boosting.

Gradient Boosting:
Boosting

Boosting effectively transforms a weak model into a stronger, more powerful model. However, Boosting degrades with noisy data because the assigned weights may not be appropriate in the next iteration.

In comparison with Bagging, Boosting here uses feedback control technique. boosting is adaptive. Bagging can work in parallel while Boosting requires sequential processing.

2.) One Algorithm, (tuned with) Different Configuration Options

In this approach, we vary the configuration options on the SAME algorithm, one the SAME dataset. This helps to determine the best tweak, parameters or options that optimizes the performance of the model or algorithm. For example, multiple configurations of a Neural Network model can be trained or evaluated with different tuning parameters to determine the optimal Hidden Units (HU) that works best for the model.

Ensemble Neural Network (NN) with multiple Hidden Units (HU)

Ensemble Neural Network (NN) with multiple Hidden Units (HU)

Notice how the four NNs are grouped into three (3) ensembles and then compared at the final output. Ensemble #1 (All NN with 3HU, 10HU, 30HU, 50HU), Ensemble #2 (10HU, 30HU, 50HU), Ensemble #3 (30HU & 50HU).

Chaining Different Algorithms

Ensemble using multiple and completely different algorithms. Each of the algorithm or model assumes some form of relationship between the inputs (features, variables) and the targets. For example, Linear regression assumes linear relation, Decision Tree assumes constant relation within ranges of the inputs while the Neural Network assume nonlinear relation which depends on the architecture.

Emsemble using Multiples of Completely Different Algorithms

Emsemble using Multiples of Completely Different Algorithms

In Summary

  • Ensemble model are great for producing robust, highly optimized and improved models.
  • Random Forest and Gradient Boosting are Ensembled-Based algorithms
  • Random Forest uses Bagging technique while Gradient Boosting uses Boosting technique.
  • Bagging uses multiple random data sampling for modeling while Boosting uses iterative refinement for modeling.
  • Ensemble models are not easy to interpret and they often work like a little back box.
  • Multiple algorithms must be minimally used to that the prediction system can be reasonably tractable.

I hope you find this useful. If you think I’ve missed something important, please use the comment box below to suggest changes or discuss your implementations. Cheers!

By @RichardAfolabi

I'm a thinker, teacher, writer, Python enthusiast, Wireless Engineer, Web geek and a solid Chelsea FC Fan. I'm interested in data science, analytics, visualization and data intelligence. Feel free to get in touch.