Using Sets to Remove Duplicates
- 02:09
Learn how sets in Pythonare used for removing duplicate values from a dataset, andhow sets can automatically extract unique values from a list of data which is particularly useful for handling large datasets.
Downloads
No associated resources to download.
Glossary
duplicates Python sets.Transcript
Being able to remove duplicate values is a very useful function in practice. Say you had a portfolio of companies, and you want to know out of all of the companies in your portfolio which industries are represented. Well, you could take a list of those industries, feed it into a set, and the set would remove all of the duplicate values so that you can see which unique industries are represented by your dataset.
Let's take that data and plug it into a Python list and set. So I'm gonna execute this code to define list one and set one. When I print list one, it's gonna look exactly like the data that we put into it, and when I print set one, you'll see that it's removed those duplicate values. And now this is a data set with only five different observations. So it's not that hard for us to just look at it and see what's unique. But if you're talking about a data set with hundreds or thousands or millions of different observations, being able to automatically extract those duplicates and look only at the unique values is super, super useful. So that's one of the major useful functions of sets. So what if you already have a list and you wanna convert it into a set to remove duplicate values? Well, that's easy to do with the set function here. I've created X, Y, Z list, which is the letters X, Y, and Z with duplicate values. I'm gonna execute that to define that list. And if I print it out, It's gonna look just like the data that I put into it. Let's say that I wanna convert that to a set. So now I'm gonna create a new variable called X, Y, Z set.
And all I'm gonna do is I'm gonna use the set function. So set open parentheses, and then I'm gonna refer back to the list that I have X, Y, Z list.
So I execute that, and now I'm gonna print X, Y, Z set, and you'll see that my duplicate values have been removed and it's no longer in the same order. I have a new set object that contains the values of X, Y, Z list, but it's removed all the duplicates.