Summary Statistics
- 01:47
How to use pandas functions for summary statistics on data sets, highlighting the application of minimum, maximum, and describe functions to both numeric and categorical features in a pandas DataFrame.
Downloads
No associated resources to download.
Transcript
As you're exploring a new data set, you'll find it useful to use the pandas functions for summary statistics. For example, right here we're looking at the minimum and maximum functions applied to the pandas data frame that we've named CSV data frame. In the current price column, it's going to apply obviously to the numeric values and give us the minimum, which is that 113 and the maximum, which is that 1,970 when it's applied to categorical features such as name and ticker, then it's going to list the minimum and the maximum in terms of alphabetical order.
So when I execute that code sell, you're gonna see that I get my minimum current price, the first name by alphabetical order, and the first ticker by alphabetical order. And then in that second section, the maximum current price is 1,970. The lowest name by alphabetical order is Microsoft. And the lowest ticker by alphabetical order is Microsoft. Using these type of summary statistics functions and pandas can save you a lot of time since it applies to the whole entire data frame instead of you having to execute a different function for each column. Pandas also includes the useful describe function shown here, which will display a collection of summary statistics for all of the numerical features in the data frame. In this case, there's only one numerical feature, current price, but if we had more, then you would see all of them displayed here. I'm gonna execute that code cell, and you can see all types of different summary statistics displayed for current price. And this is a great way to wrap your head around your data when you're still exploring a data set and getting to know the different characteristics of the data that you're analyzing.