Confusion Matrix
- 02:52
Understand binary classification in predictive modeling, including true positives, true negatives, false positives, and false negatives.
Downloads
No associated resources to download.
Transcript
In binary classification such as your investor classifier, where you're trying to distinguish between two classes, there are four possible results of any prediction. You can correctly predict a positive, which in our case would be predicting a decline. That's called a true positive. Or you can correctly predict a negative, which in our case would be correctly predicting a commit that's called a true negative. Or there are two possible results that are mistakes. You could incorrectly predict a decline when the investor actually is going to commit. That's called a false positive. Or you can incorrectly predict a commit when the investor is actually going to decline. That's called a false negative. For a perfect model, every decline prediction would be a true positive, and every commit prediction would be a true negative. Anything else is an error. Let's take a look at the confusion matrix for one of our classifier models. Follow along with the code that you see here. We're importing the confusion matrix function, and we're making predictions with our L one logistic regression classifier model. Then we're printing the confusion matrix, comparing our predictions to the actual test results, and below that you see the output, which is our confusion matrix.
What that means is out of 1,146, actual commits, which is our negative class in the test set, the model correctly identified 1,124. Those are true negatives. Then out of 301 actual declines, which is our positive class in the test set, the model correctly identified 278. Those are our true positives.
The true positive rate is the number of true positive predictions divided by all actual positive observations.
And then the false positive rate is the number of false positive predictions divided by all actual negative observations.
There's an important relationship between the true positive rate, the false positive rate, and the probability threshold of your model. As you know, classification models can create curved predictions that give the probability that an observation is positive or negative.
To make a prediction, your model compares the probability of a positive observation with a probability threshold. The default threshold for the predict function is 0.5, meaning that the model will make a positive prediction if the probability is greater than 50%. As you lower the threshold, your model will make a greater number of positive predictions increasing both the true positive rate and the false positive rate. Your model will get more actual positives, right, and it will also get more actual negatives wrong.