Spotting Outliers Workout 2
- 01:23
Practice identifying and analysing an outlier in stock data.
Downloads
No associated resources to download.
Glossary
Histograms Machine Learning Python Stock DataTranscript
Stock data contains 100 observations with 9 features. When you look at the histograms, do you notice anything that stands out? There's one extreme outlier messing up all the price related features. It must be an error. In your Jupyter Notebook filter your stock datadata frame with a Boolean mask that only includes the outlier.
So here I'm calling the stock data dataframe, but I only want to show the observations where the current price feature has a value of greater than 300,000. And that's just an eyeball number because I can see up here that this outlier seems to be just outside of 300,000. So when I take this Boolean mask and I apply it to my stock data data frame, I can see that that outlier is Berkshire Hathaway, which makes sense because I know that Berkshire Hathaway does not do stock splits, so their price is outrageously high compared to most stocks because that's Warren Buffett's philosophy is I don't do stock splits. So we know now that this is a legitimate outlier, it's not an error. And so now we can make a decision on whether we want to include that in our data and in our analysis, or if that's something that we might want to exclude.