Cross Validation
- 02:06
Cross-validation in machine learning.
Downloads
No associated resources to download.
Transcript
Now we're going to talk about cross validation. Cross validation allows hyperparameter tuning without using your testing data. You already split your data into training and testing sets. Cross validation repeats that splitting process multiple times within only your training data. Hyperparameter tuning happens via trial and error, meaning that every possible combination of values is tested and compared against the rest. In order to compare alternative hyperparameter configurations, you must have a training set to train the competing models and a testing set to evaluate their predictions. However, you must save your original testing dataset to evaluate your final tuned models. If you tune your hyperparameters with your final testing dataset, you will be vulnerable to overfitting and poor performance with new unseen data. The example you see here illustrates what's called 4 4 fold cross validation, which means first the training data is split into cross validation training and cross validation testing sets. Second, each hyperparameter configuration is trained on the cross validation training set. Third, each hyperparameter configuration is tested with the cross validation test set. Fourth, the entire process is repeated 4 times, which is why it's called 4 fold cross validation. And then finally, each hyper parameter configuration is evaluated based on the average R squared from each of the 4 folds. After this cross validation process, the hyperparameter configuration with the highest average cross validated R squared for each model class is then chosen as the optimal hyperparameter configuration. And then that is what's tested using the final testing data set. Because the winning models have never seen the final testing data set even during hyperparameter tuning. This is a valid measure of the tune model's actual performance.