Hyperparameter Tuning
- 04:32
How hyperparameter tuning in machine learning models works.
Downloads
No associated resources to download.
Glossary
Hyperparameter tuning Machine Learning PythonTranscript
Now we're going to talk about something called hyperparameter tuning. Tuning machine learning hyperparameters is kind of like tuning a race car. In the case of the liquidity regressor that you're building right now, you have 5 classes of race cars, your 5 classes of regression models, lasso ridge, elastic net, random forest, and gradient booster. You don't just want to find the highest performing vehicle or the highest performing model. You also want to find the highest performing tuning for that vehicle. A hyperparameter of a machine learning model is simply a characteristic of that model class that you can tweak in a way that impacts performance, kind of like the suspension of a race car. In order to identify the highest performing regression model and the highest performing tuning for that model, you must define alternative values to test for each hyperparameter. Each potential combination of hyperparameter values is called a hyperparameter configuration. In this race car example, each model has three hyperparameters, suspension, engine, and tires. Each hyperparameter is given three alternative values to test. So suspension has one, two, and three. Engine has A, B, and C, and then tires have Roman numerals one, two, and three. In this simple example, this means that there are 27 potential hyper parameter configurations for each model class. Three suspension values multiplied by three engine values multiplied by three tires values resulting in 135 competing models. That is five classes for the five cars multiplied by 27 hyper parameter configurations. During the cross validation process, each one of those 135 competing models is evaluated, which is to say the R squared is calculated and then the highest performing model class is identified along with its optimal hyper parameter configuration. That is the winning model you'll use to advise your client.
Different classes of models have different kinds of hyper parameters. The lasso model has a hyper parameter called alpha, which measures the strength of the regularization penalty factor. The ridge model also has alpha and elastic net has alpha, but it also has something called the L1 ratio, which is the ratio of L1 regularization to L2 regularization. And that is to say it's the measurement of the blend between the lasso style regularization and the ridge style regularization. The random forest model has the number of strong learners that you want to use, and then the maximum number of input features. And the gradient booster has the number of weak learners, the learning rate, and the maximum depth of the decision trees, which is to say it's the complexity limit of each weak learner. And what we wanna do is not only find the highest performing model class, but we wanna find the highest performing hyperparameter tuning for that model class. And we're gonna do that through trial and error.
In order to test the various hyper parameters to find the optimal tuning, you first need to create lists of values that you want to test for each hyperparameter, because elastic net, random forest and gradient booster have multiple hyper parameters. You're going to group these lists of hyper parameter values into one dictionary for each model class. Each dictionary is called a hyperparameter grid. Follow along with the code that you see here to create a hyperparameter grid for lasso. Here I'm naming this dictionary Lasso Hyper parameters, and then open curly brace to start my dictionary. The key is important because this is a specific name that's already created inside Scikit-Learn. It must begin with the model class followed by two underscores, and then the specific name of the hyper parameter. You cannot name this key, just anything that you want. It has to follow this specific format. Then I have my colon to separate my key from my value, and my value is a list of the values that I want to try for this Alpha hyper parameter. Scikit-learns default value for lasso alpha is 0.1. So here I've just added a couple of values below that and a couple of values above it to see which works best.