AUROC
- 01:27
Understand the area under the ROC curve (AUROC) as a metric for evaluating classification models.
Downloads
No associated resources to download.
Glossary
AUROC classification models Machine Learning PythonTranscript
To evaluate your classification models, you're going to need a different metric because R squared is no longer a good indicator. To illustrate what I mean, imagine that a model predicts commit for every single observation because only 20% of the observations are decline. This model would have a reasonably high R squared that indicates it explains 80% of the variation in the target variable, but it would predict the wrong answer for every single decline observation. If you were modeling a scenario with even more imbalance classes like credit card fraud, where 99% of the transactions are valid, R squared becomes even less useful as a success metric because your model could miss every fraudulent transaction and still have a nearly perfect R squared for classification, especially with imbalance classes like you have in your investor classifier. The most reliable metric is area under ROC curve or AUROC, essentially AUROC. Ask the question, if the model compares one observation from the positive class and one observation from the negative class, what's the probability that it can distinguish the two? For this reason, the AUROC metric is indifferent to the balance between classes and eliminates the bias of the R squared metric that you just saw. In order to develop a better understanding of what the AUROC metric actually is, we must first discuss the confusion matrix.