Segmenting with .groupby
- 01:56
How to use the Pandas group by function for segmenting and summarizing data across different categories
Downloads
No associated resources to download.
Glossary
groupby Pandas Python SegmentingTranscript
One final piece of Panda's functionality that you're going to find useful is the group by Function, and it allows you to segment and summarize data across different categories or classes. This data set that we're looking at right here has two different classes, football players and ballet dancers. To understand this data, you have the option of using aggregation functions like mean for summary statistics across the entire dataset like you see here. The problem is since the classes are so distinct, applying aggregation functions to the entire dataset is not very helpful. There isn't anybody in this data frame who is anywhere close to 180 pounds, so that statistic is misleading and doesn't help us to understand our data.
To solve this problem, we have the groupby function, and the way that you use it is you first write the name of your data frame, which in this case we've just named it data frame, and then the groupby function, and in parentheses the name of the series by which you want to group your data frame. So now we're grouping by occupation, which is this feature, and it has two classes, football player and ballet dancer. And then after we group, we have a dot, and then the aggregation function that we want to use. And when I execute that cell, it's going to apply that aggregation function, but it's going to apply it to each class. So here I have occupation is ballet dancer and an average or mean height of 67.6 inches and a mean weight of 118 pounds. Then I have my football player. Their mean height is 77 inches, and their mean weight is 242 pounds. When you're trying to wrap your head around a data set with thousands of observations and multiple different classes, this group by function can be an invaluable asset.