Train Test Split
- 02:46
How to use the train test split function in Python to divide a dataset into training and testing sets.
Downloads
No associated resources to download.
Glossary
Machine Learning Python Train TestTranscript
Our next step is to use the train test split function like you see in this cell right here. You'll want to pass the train test split function four arguments in total. First your inputs data frame for the second argument, your target series. For the third argument, the proportion of the data to set aside for testing, which in this case is 20%. And finally, the random state because the function splits observations using random selection, and you want to be able to reproduce the results that you see in this video. The train test split function will return a list starting with your training inputs, and then second your testing inputs. Third, your training target values. And then fourth, your testing target values. Follow along in the code that I have right here. So I'm creating this new object called results, and that's going to be equal to train test split with my inputs data frame, my target series, the test side's argument of 0.2, and then a random state of one. Make sure and get that random state the same so that your results are the same as mine. When I execute that cell trained test split is going to create this new results object. That's going to be a list and we want to check and make sure that that list contains the proper objects that we want. So I want you to copy this as well. And when you execute the cell, make sure that your answers match mine. First, print the object type of results. Your answers should be that it's a list. Then print the length of results. You should see that it contains four objects. Here I have a little spacer just for readability. And then we're gonna create a quick for loop that says, for every item in the list results, print the shape of that item. When I execute that, I can see that the class of results is a list and it contains four objects. The first object contains my training inputs, and you should see that that has 641 rows and 7 features. Second, I have my testing inputs, which contains that 20% that I'm setting aside, and it's 161 rows and 7 features.
Third is my training target values. So these are the available liquidity values that I'm using to train my model, and there should be 641 of those.
And then finally, I have my testing target values, and those are the available liquidity values that I'm going to use to evaluate my model after it's trained. If your answers do not match this, go back and double check your code. If your answers do match this, then let's go ahead and move on.