Boolean Masks
- 03:47
How Boolean object types in Python, demonstrating how to create and apply Boolean masks to filter data frames based on specific conditions.
Downloads
Transcript
In the second lesson, you learned about the Boolean object type, which encompasses true false, and any code that you write that returns one of those values. Here I just have 1 is greater than 0, and then we're checking the type of 1 is greater than 0. When I execute that cell, you're gonna see that 1 is greater than 0 returns true, and that the type of that is Boolean. That's an example of the Boolean object type. You can filter your data frames with something called a Boolean mask. A Boolean mask is a Panda series containing a sequence of true and false values, just like you see here in Excel. If you wanted to filter this table using the equivalent of a Boolean mask, then you could create a column of true and false values testing a condition like you see right here where we're testing whether the values in series one are greater than two. And then once you test that condition and have true and false values, you could then filter out all of the rows with false values. That's essentially the function performed by a Boolean mask in Python. In Python the process of creating a Boolean mask is really similar to what you just saw. It just looks a little bit different. So here we've created a new data frame called series df, and then down below we're creating this Boolean mask. I've named it Boolean Mask, and I'm defining it as first my series. I'm pointing to my series, and then a condition to test. So I'm testing if the values in my series, series one are greater than the value two, and then I'm just displaying my Boolean Mask. So you see here, my Boolean mask looks a lot like what we saw in Excel. I have those two false values because one and two are not greater than two, and then three, four and five are greater than two. So we get true values. To apply that Boolean mask to our data frame. We write the name of our data frame, and then in square brackets the Boolean mask. And that tells Python that we want to show the rows in series dataframe where our Boolean mask indicates a true value. So when I execute that cell, I'm going to see only row three, four, and five. If you want to invert the filter so that false values are shown and true values are hidden, use the tilda symbol also called the invert operator in front of the name of your Boolean mask. Just like this, when I execute that code cell, I'm going to get one and two, and three, four and five will be filtered out. There's actually a shortcut that can save you the step of creating your Boolean mask, that separate Pandas series containing those true and false values. So if I just use my Boolean mask like you saw before by writing the name of my data frame, and then in square brackets the name of my Boolean mask. Here I'm filtering out row one and two, And the result is row three, four, and five. I can get the exact same result by instead of creating this Boolean mask right here, I just pass in square brackets my condition. So I write the name of my data frame, and then in square brackets the series that I want to test, and then the condition that I'm testing and Python will automatically interpret that as a Boolean mask and give me the exact same output. If I wanted to invert that, I can just put my series and my condition in parentheses like this, and then just throw a tilda or that invert symbol in front of my first open parentheses, and when I execute it, I'm gonna get the inverted Boolean mask.