Spotting Outliers Workout 3
- 01:15
Practice removing outliers from a stock data set to improve the clarity of histograms for continuous features.
Downloads
No associated resources to download.
Glossary
Histograms Machine Learning Python Stock DataTranscript
Next display histograms for all of the continuous features in stock data. After filtering out the outlier with a Boolean mask, having Berkshire Hathaway in our dataset kind of skewed our histograms so that all that we could see was this big block of most of our observations and then that one outlier. So let's remove Berkshire Hathaway and take another look at our histogram so that we can get a little bit more detail to accomplish that. I'm creating a new data frame called stock data sans berk, stock data without Berkshire Hathaway. And I'm using this same Boolean mask that we had before. Then I'm just displaying histograms as we did before, using the hist function and the argument fig size equals 5, 7 width the 5 in a height of 7, just to make those graphs easier to read. And then using pyplot and the show function to display.
And now I can get a little bit more information out of current price, price increase, and price target because I don't have Berkshire Hathaway skewing the picture. And I can see that there are still a couple of outliers at the 1,000 mark and near the 2,000 mark.