Decision Trees & Ensemble Methods
- 04:32
How decision trees and ensemble methods are used in modeling non-linear relationships and addressing overfitting through techniques like bagging and boosting.
Downloads
No associated resources to download.
Glossary
Python Regression algorithmsTranscript
Let's talk about decision trees and ensemble methods. Linear regressions, even advanced models like the Lasso Ridge and Elastic Net still suffer from a major flaw, which is that linear algorithms are not very effective at modeling non-linear relationships. Luckily, there's another category of algorithm that excels at modeling non-linear relationships. Decision trees, you're already familiar with decision trees, even if you don't know it, you use them every day. For example, when you're thinking about what should I wear today? You might ask yourself, is it raining outside? And if the answer is yes, then you're gonna grab your rain jacket and you made your decision. If it's not raining, then you might ask yourself, what's the temperature? If it's really cold, you might grab your coat. If it's a little bit cool, you might grab your sweater. If it's warm, you're gonna put on a T-shirt, and if it's really hot, then you're gonna put on a T-shirt and shorts.
Now, imagine some of the finance applications. For example, you might be asking, should I buy, hold, or sell? The security and the prediction you're trying to make is the projected return of that security. If the return is expected to be negative, then obviously you wanna sell that security. If it's positive, then you might ask yourself, Hmm, what's the return? If you predict the return to be less than 8%, then maybe you hold onto it. You don't wanna sell it, but you don't wanna buy more. If the return is less than 15%, you might wanna make a buy recommendation. And if you expect the return to be greater than 15%, you might make a strong by recommendation. Decision trees combined with a technique called ensembling will be the most used tool in your machine learning toolbox due to ease of use, variety of applications and outstanding performance. Decision trees model data as a tree of hierarchical branches. The model creates branches until reaching leaves that represent predictions such as buy, hold, or sell. The first branch splits your dataset based on the feature that provides the most information, and then lower branches become more and more specific. For example, if you are playing 20 questions and had to guess an unknown object, you wouldn't start by asking, is it Jamie Diamond? It would be more useful to start with a question like, is it alive? And then is it a person? By splitting data in this way, decision trees can model non-linear relationships. Imagine that you're modeling the annual income of individuals. The income of professional football players is positively correlated with their height and weight, but the income of professional jockeys is negatively correlated with height and weight. A linear regression has a hard time naturally expressing this relationship. But a decision tree model could easily do this by splitting individuals by occupation before splitting them by height and weight. However, despite the benefits decision, trees do have a big weakness. Similar to how linear models can overfit when given too many input features. Unconstrained decision trees with too many branches can overfit by completely memorizing your training data. Think about it. If your data has at least a few features, each data point represents a unique combination of those feature values. An unconstrained decision tree could just create one different leaf for each data point. So like you see in the graph, the model's accuracy on the training data would be perfect, but the model would be overfit not generalizable, generally completely useless in the real world. Just as you counteract overfitting and linear regression models with regularization, you can also counteract overfit in decision tree models with ensemble methods. One ensemble method used to limit decision tree overfitting is called bagging. Using the bagging method, a large number of strong learners are trained simultaneously, and then all of the strong learners are combined together to smooth out their predictions. Now, a strong learner is a decision tree that's intentionally permitted to overfit, so you can think about bagging like a group of highly intelligent and opinionated scientists arguing with each other to determine the right answer. Their predictions are too complex individually, but together they're just right. A second ensemble method is boosting. The boosting method trains a large number of weak learners in sequence with each subsequent weak learner, correcting the mistakes of the one before it. Weak learners are not very smart alone, but together, they're very effective. You can think of 'em like the little minions. Each of them by themselves is not super effective, but when they work together, they can accomplish amazing things. You're going to use two types of decision tree ensemble algorithms in this course. The first one is the random forest regressor, which combines many strong learners, which is the bagging method, and then the gradient boosting regressor, which combines many weak learners. The boosting method.