Spotting Outliers Workout 1
- 01:31
Practice displaying the dimensions of a stock data data frame and create histograms for all continuous features.
Downloads
No associated resources to download.
Glossary
Histograms Machine Learning Python Stock DataTranscript
Open up your Jupyter Notebook, display the dimensions of the stock data dataframe, and then display histograms for all of the continuous features in your dataframe.
I'm gooing to start by displaying the dimensions using the shape attributes, so stock data dot shape, and that shows us that we have 100 observations and nine different features in our dataframe.
Then I'm calling the stock data dataframe and I'm using the hist function to create histograms. This fig size argument changes the size of the figure that we're printing out, so the size of the different histograms, and then I'm using the pyplot show function to display it. So when I execute that cell, I have all of my different numerical features, so my features that have numbers instead of just classes, and I can see the distribution on a histogram for each of these, which is really useful for me getting to know my data set because now I can see in current price, in market cap in price increase and in price target I have non-normal distributions, and specifically in the price related features. I can see that there seems to be one big outlier. That would be something that I would want to explore further as I'm getting to know my dataset, because I wanna know if it's a legitimate outlier that I should consider as I'm doing my analysis, or if maybe it's an error or if maybe it's a legitimate outlier, but I want to exclude it from my analysis.