Model Pipelines
- 04:38
How to create and evaluate multiple regression models, including Lasso, Ridge, Elastic Net, Random Forest, and Gradient Booster.
Downloads
No associated resources to download.
Transcript
Later on in this lesson, you're going to create several competing versions of the lasso, ridge, elastic net, random forest, and gradient booster regression models. You're going to evaluate each version against the others to find the model with the best performance. Prior to this model competition, it's necessary to define the process you'll repeat for each class. This process is called the model pipeline. For this liquidity regressor algorithm, the pipeline for each model class will be a simple two-step process. First, standardize the training data to a common scale, and then second, apply the model class to the training data with a given random state. Standardizing the data to a common scale prevents your machine learning algorithms from overemphasizing input features with a larger scale. That is to say just because input features have larger values doesn't mean that they should have a greater influence on your model. But if you don't standardize, that's what's going to happen. The scikit-learn package includes a function for creating model pipelines called make pipeline and a standardization function called StandardScaler, which is case sensitive.
You can indicate the model class of each pipeline with the format that you see here, and that accepts the argument random state equals whatever number. So just as we did with traintest split, make sure to set the random state to one when you have the chance so that your results match the results that you'll see in this lesson.
You're going to store your model pipelines in the dictionary, object named pipelines. Follow along with the code here to import the necessary functions from the scikit- learn package and also create model pipelines for the lasso and ridge model classes. Here we're going into the scikit-learn linear model module, and we're importing the lasso machine learning model and the ridge machine learning model. We're going into the pipeline module to import the make pipeline function, and then we're going into the pre-processing module to import the standard scaler function. Now we're creating a new dictionary called Pipeline, so pipelines and then open curly brace. And then each of these items is going to represent one of the models that we're going to use. The key that I'm using is lasso a string, and then remember that colon we're using the make pipeline function, and then we're passing two arguments. First, the standard scaler function to tell the pipeline that the first step is to use that standard scaler to put all of our data on a standard scale. And then we're telling it that the model associated with this item lasso is the lasso model that we imported from scikit-learn. And then inside that lasso model, we're passing it the random state one. And that's so your results match the ones that you're going to see in this video, and then remember to put the comma after the end of that item.
And then the next item we're calling ridge again using the make pipeline function, the StandardScaler function as your first argument and then a comma. And for the second argument, the ridge model from scikit- learn again with the random state one, and then close curly brace. Execute that cell And now you have pipelines for the lasso model and the ridge model.
In the next exercise, you're going to add a couple of new pipelines to this pipelines dictionary. And if you remember back to our dictionary lesson, the way that you do that is by starting with the dictionary name and then in square brackets the key name of the new item you want to create, and then set that equal to whatever the value is for that item, which in this case is the make pipeline function. So for example, and go ahead and copy this in your code because you are going to need this pipeline as well. So follow along with me here. We're going to import from the linear model module of scikit-learn. We're going to import the elastic net model.
Then we're adding a new item to pipelines and the key for that item where we're going to make enet short for elastic net, and the value for that item is the make pipeline function. And then StandardScaler as the first argument, the elastic net model as the second argument. And we're giving that elastic net model a random state of one so that your results match the results in this video. When you execute that cell, now the elastic net pipeline has been added to your pipeline's dictionary. Make sure that you followed along and copied all of the code that you see here because you're going to need it later in the lesson.