Spotting Outliers Workout 4
- 01:38
Spotting Outliers Workout 4
Downloads
No associated resources to download.
Glossary
Machine Learning Python Stock DataTranscript
At this point, you should see that there are a couple more outliers in the price related features. Display all observations with the current price, exceeding $1,000 to make sure that they aren't errors. Then display histograms for all observations, less than $1,000.
I'm going to start by applying a Boolean masks to my stock data data set that will display all of the observations with the current price of greater than 1,000. So here I get four observations. Berkshire Hathaway, which I already knew about, and then Amazon Alphabet, which is Google's parent company and Booking holdings. I can double check these now by just looking up these companies to verify that these current prices are correct and they are. Despite the fact that these prices are correct, I still might want to remove them as outliers if I believe that they're not adding any useful information for my machine learning algorithm to learn from. If I wanted to do that, it would look something like this. I'm creating a new dataframe called stock data sans outliers, without outliers, and then I'm passing this Boolean mask so that it will only include observations where the current price is less than 1,000. Then I'm displaying histograms for that new dataframe using pyplot and the show function to display it. And so now I can see a little bit more normal looking distributions on current price, price increase and price target. It looks like I have a decent grouping of all of those and nothing is too extreme.