Spotting Outliers
- 01:08
The importance of identifying and making informed decisions about outliers in machine learning projects, and introduces the use of histograms to understand the distribution of continuous variables and identify outliers.
Downloads
No associated resources to download.
Glossary
Histograms Machine Learning Python Stock DataTranscript
You'll often run across outliers in your machine learning projects. Sometimes outliers indicate errors in your data, but other times they're real unusual observations. In these cases, you must use good judgment to determine whether to include them in your dataset or remove them. When you're familiarizing yourself with the numeric variables in your dataset, also called continuous features, a histogram is a good place to start. By looking at the histograms for each continuous variable, you can get a sense for the variable distribution and identify any outliers. You can display histograms for each continuous variable using the mapplotlib hist function that you see right here. After calling that hist function, use the pyplot show function to display those histograms below your cell. If you want to change the size of your histograms from the default, you can use the fig size argument. So right here I'm passing fig size equals a width of seven and a height of three, and when I execute that cell, you can see that the size of my histograms have changed and now it's a little bit more readable.