Removing Columns
- 04:46
Using query editor to remove rows/columns and promote headers.
Transcript
Removing Columns Removing empty columns from a dataset is one of the first things that we need to do when we are cleansing or transforming our data.
An empty column is easily identified in query editor. It will just contain lots of null values. It won't have any specified column heading. We'll just see something like column 9 column 10 column 11. It's actually really important that we do get rid of these columns, although it might seem as if they're not doing any harm when we bring that dataset through into our report they will be listed as one of the fields in the field list taking up a lot more space than what's necessary.
Sometimes we have columns that contain partial values and there it depends on how many values we have in that column as to whether it's going to be really any use to us in our analysis. We have a tool in Power BI query editor called profiling and that will let us see what percentage of the column contains empty or null values. So if we're looking at say 90-92% of the column containing null values, it's not really worth bringing that column through into the report.
Sometimes a column will contain values and it will be filled with values, but they're really irrelevant in the report and we can use a tool such as the filter tool to help us identify those columns. And so we see a column like this where it's exactly the same value for every single. Row again, it's really not much point in bringing that through into the report. So that's again another column that we would remove.
Let's do a workout and see how to remove columns in Power BI query editor.
So the first thing I'm going to do here is just go to my get data and connect to an Excel workbook. So module 3 lesson 2
and when I open that I'll select a worksheet here called remove columns.
I can see even in the navigator preview that over here I have a lot of extra columns that I won't need so rather than load through into the report directly and going to click on transform data to go into query editor.
So once I'm in query editor, I can start removing out these extra columns. So I'm going to scroll across here and find some of them, there we go. Column 11. I can take out one column at a time quite simply just select it right click over it and choose remove.
But if I have a lot of columns, I'll probably want to select them all in one go.
And then remove them quickly so I can use my control key. It will allow me to select columns that are non adjacent. But if I have a block of columns like this, I'll select the first one.
Scroll as far as I can to the very last one and then hold my shift key doing when I click on it and that will allow me to select the block. So I'm just going to right click here over one of them and choose remove columns.
There we go, that's a lot tidier already.
Now if I look at the location column, I do have some values but I also have some nulls so it'll be really helpful to see what percentage of nulls are in this column. So I'm just going to go to my view Tab and click on column quality.
And so I can see right away that 92% of this column contains null values, there's really no point in me bringing that through because it's not going to help in my analysis. So I'm going to right click over it and just remove it.
Just have a look here at one last column.
So column, the first column the month column and I'm going to use the filter here because the the profiling is telling me that I don't have any null values. But if I go into the filter I can see there's actually only one single value contained. I'm just going to cancel that out there and right click over it and remove it as well.
Once I'm happy with my dataset I go to the Home tab click on close and apply.
And that will now load that cleanse data set into my report so having a quick glance at my field list over here I can see I don't have those empty columns only the columns that I had chosen to keep in my dataset.
If I wish to at any point in time, I can go to transform data again.
Click in here and it will switch me back through the query editor. So if I do decide for example that there's another column that I'd like to remove I can just right click here and remove it out that way.
And once I've made that additional change, I'll close and apply and switch through so that I have my completed dataset.