Sparse Classes Workout
- 02:14
Practice using sparse classes in Python.
Downloads
No associated resources to download.
Transcript
You'll notice that there are not very many observations in the materials, telecommunication services, utilities, and real estate sectors. So let's go ahead and add materials to the industrial sector and then combine telecom, utilities, and real estate into one other bucket. Finally, when you're finished with that display a countplot with your results to verify that everything worked correctly, we're going to address the issue of sparse classes. And sparse classes are classes in a categorical feature like we have in sector. In this example where there are very few observations, machine learning algorithms require as many observations as possible in order to make accurate predictions, because it has to learn patterns from your data. So a sparse class that only has a few observations is not useful for your machine learning algorithm and might actually be detrimental. We can get some useful information out of these classes by combining them together or dumping them into another bucket that's similar. So in this case, we're going to add materials to the industrial sector, and then we're going to combine telecom, utilities, and real estate into one other bucket.
The process here is exactly the same as the last exercise. So stock data are data frame sector, our feature, and then the replace function. Our first argument is the class that we want to replace. The second argument is the new value that we wanna replace it with, and then we're using the inplace argument here. We're replacing telecommunication services, utilities, and real estate. So those are all together in the first argument inside a list. Our second argument is what we want to replace it with, which is the other class.
And when I execute that cell, it's going to make those changes in our stock data dataframe, and I can see that when I execute this last cell with our countplot function. So you can see here that these different sparse classes have been aggregated into industrials and the other buckets. So now we have fewer classes with more observations. So that's gonna be more useful for our machine learning algorithm to learn from.