Data Extraction in Excel – Felix Live
- 29:08
A Felix Live webinar on Excel - Data Extraction.
Glossary
Transcript
Welcome to this session, as you can see, hopefully, from the shared screen. My name's Phil Sparkes, and I'm a full-time tutor at Financial Edge. And this session is all about Excel, and it's about using Excel data extraction tools. So, the purpose of this is basically to use some of the Excel data extraction tools. So we're not going to look at notes.
We only have 30 minutes to go, so we're going to dive straight into Excel, and I'm just going to show you how some of these data extraction tools actually work. So if I can escape out of that.
And here we go. So we're straight into this.
As you can see in the title of the spreadsheet, this is the data extraction workout one empty.
So I'm going to start with some very simple examples.
We're going to use VLOOKUP. Now, VLOOKUP has been around in Excel for years, ever since the earliest days of Excel.
And, oh, excuse me, ever since the very early days of Excel, and I guess it's always been that sort of boundary between very simple users who just effectively use Excel just for keeping track of things, just adding a few things up, and people that start using Excel in a slightly more meaningful and more sophisticated way. So, without further ado, we're going to dive straight into this.
Now, what does this little example here say? What's it asking us to do? It's got an example. It's very simple, only a handful of lines, and it's basically saying, if your wage was 25,000, what would be the tax rate? Assuming you've got tax bands that go something like this.
And you can kind of see in here, your wage of 25,000 is somewhere between 10 and 30, and therefore, you presumably get taxed at 30%. Now, this is obviously very straightforward, but what if you had a list of 10,000 lines? How would you use VLOOKUP? So what VLOOKUP does is, the clue's in the name. It's V for vertical lookup.
It looks up and down a list. So what it does, and as you start typing VLOOKUP, it gives you the syntax of the formula, the syntax of the function, and it says, first of all, what do I want to look up? What's my lookup value? And the answer is, it's the 25,000 just above. And then it says, where is the table of data that you're going to look for? And the answer is it's there.
It's the income on the left-hand side and the tax rate on the right-hand side.
And one of the limitations of VLOOKUP is you always have to have the thing you're looking for in the left-hand column.
Something like this.
So it then says, I'm going to look in here, I'm going to try and find the 25,000 on the left-hand side, and then VLOOKUP says, what do you want to bring back? And it gives you a number.
You put a number in here that represents the column number of the data you want to bring back. And this could be 10 columns wide, and you want column three. In this case, it's the second column.
We want the tax rate. I hit Return, and there we go.
It's successfully brought out this 30% because my income is above the 10,000 but below the 30,000. If I change my income to, say, 35,000, then it changes to 34%. Interestingly, if I change it to 29,999, it's still not the 30,000, so it goes back to the 30%.
It's only when it changes to the 30,000 that it jumps up to the next line. So the standard default method operation of VLOOKUP is to go to the last one before the next boundary.
It calls itself an approximate, but it always kind of looks below.
It always looks underneath. What if you want an exact match? Well, here we've got another example, perhaps a little bit more obvious. We've got a great big, long list of products, and we want to find some information about one particular product.
We can use VLOOKUP again. But the problem here is that we don't have numbers, so we can't say approximately A134 or a bit lower than A134 or a bit higher than A134. We need an exact match.
So what we do, same format, we point to the thing we're going to look up, the A134. We then point to the whole table.
You always have to highlight the whole table, not just the thing on the left, not just the column on the left. And then you want the second column again. But this time, if you look carefully at the syntax, it says range lookup. And if I leave that as a default, it defaults to true, which is the approximate match that we had in the last one. But this time, I want an exact match, so I need to make this false. So I type in false. I could also type zero for false and one for true, and that would work as well.
I hit Return, and now it's come up with exactly that number there, the 3.5, which corresponds to that product ID of 134. So you have to put a zero or a false at the end if you want it to get an exact match.
Okay.
It's okay, but it's a bit clunky. It's a little bit limited.
Hard coding these numbers in for the number of columns is not very sophisticated and not very flexible.
So how can we make VLOOKUP a little bit more flexible? And what we can do is we can use VLOOKUP but also use a MATCH function. I'll just show you briefly what a MATCH function does.
So if I say I want to match revenue, that number there, and I want to find where revenue is in this range of data, I then get an option under the MATCH function, and it basically says, what sort of match do you want? So you can see it's a little bit more sophisticated.
Less than, exact, or greater than.
This case, I'm going to go with exact, which is zero.
I hit Return, and it comes back with a four because the revenue column is the fourth column.
So if I embed a MATCH function within a VLOOKUP, I can start to create a little bit more flexibility.
So I'm going to go into here, into my function.
I'm going to hit F2. I'm going to basically embed this MATCH function within a VLOOKUP.
So F2 again. So I'm going to say VLOOKUPAnd what do I want to look up? I want to look up the A134, just as we did before.
I want to look it up in my whole set of data, which is there.
But then the column number is basically going to be driven by that match function that I've already implemented, I've already put in, and that comes back with the four.
I can go to the very end, and my VLOOKUP, remember, I need that false or true. I want an exact match, so I put a zero on the end. I hit Return, and it basically gives me that revenue figure there. Now, just to demonstrate it works, what if I change this to Price? You can see it now comes up with the three and a half.
So the match function is incrementing or changing that column count or that column reference.
So it's starting to get a little bit more sophisticated.
But again, one of the downsides of VLOOKUP is still it's relatively inflexible. It always has to have the data you're looking for in the left-hand side. It can only ever step to the right.
It can't step to the left and so on. And it's not very flexible.
If you don't get a match, then what happens, and so on.
So you can start to be a little bit more sophisticated by joining a number of functions together. And in my very old Excel lessons, this was kind of step two. This was lesson two.
Learn VLOOKUP in lesson one, and this is lesson two, what can we do that works like a VLOOKUP but gives us more flexibility? And it's a combination of using something called INDEX and using something called MATCH as well. Just before we move on to this, I just want to mention something called HLOOKUP. If you've ever come across HLOOKUP, HLOOKUP is exactly the same as VLOOKUP, but it's just basically things flipped around.
V for vertical, H for horizontal. So in an HLOOKUP function, you have your data on the top, the thing you're looking for, the company name, the asset name, something like that along the top, and then you step down a number of rows, rather than finding it in the left column and stepping to the right. That's the only difference with HLOOKUP.
So we're going to use INDEX and MATCH.
And what INDEX does, INDEX points to a particular array, and that allows you to select a row or a column from that array.
It enables you to drive that separately.
So I'm going to start with my INDEX function.
So my INDEX function says I want to find information in this array here. Okay? And then you can see the top item here, it says then what row number do we need and what column number do we need? So I'm going to go with row first.
So I'm going to say MATCH again. And what I want to do with my row numbers is to find A134.
Where do I want to find it? I want to find it in that first column on the left-hand side. What sort of match do I need? I need an exact match, so I put zero in here.
And then which column do I need? I do exactly the same, but this time for the revenue. So I'm going to say MATCH again.
What do I want to find? I want to find revenue.
Where do I want to find revenue? I want to find it in that top row of headings there. Again, what sort of match? A perfect match, an exact match with zero. And now I'm back to my INDEX.
I've completed the row number. I've completed the column number.
I know my array of data. I hit Return, and I get the correct answer. And again, let's just try this. Let's just change it.
Change it to Price, and I get 3.5. You can see, interestingly, very sort of subtle point here, it's case agnostic.
So I've got Price with a lowercase P.
You can see that Price in the heading is uppercase P, but it's still okay.
It still does that match. Okay.
We're going to use something called OFFSET and MATCH.
And OFFSET and MATCH works very similarly to INDEX and MATCH, but all OFFSET does is it points at one particular cell, not an array of cells, not a range of cells, one particular cell, and then moves up or down, left or right away from that one cell. But we're going to use OFFSET and MATCH just the same as an alternative. Equals OFFSET. So where's my OFFSET? I'm basically going to go to here. And this is often the case with OFFSET.
When you look at an OFFSET function, you are often one cell away, one row above, one column to the left of where all of your data is. And the reason for that is because if you start on that particular cell that I've just highlighted here, then when I step one row downwards or one column to the right, because that's what my MATCH function comes back with, then I often basically step to the right or down from that original starting cell, that original reference cell.
So I start here, and then I need a MATCH.
And again, you can see it says which row do I need.
So I'm going to use a MATCH function.
So I'm going to say I want to find the product ID, and I want to find the product ID within that range here. And you can see there's a subtle difference here.
With INDEX, all of our ranges overlapped.
Here, they're actually separate.
I'm not starting with the product ID cell.
I'm starting one cell below that. Okay? And again, zero at the end because I want an exact match.
What about my columns? MATCH again. What do I want? I want the price in this case.
Where do I want to find it? Again, step one away. No overlaps with OFFSET.
So those three column headings. What sort of match do I need? An exact match.
I hit Return for both my-- close brackets for both my MATCH and my OFFSET function. Hit ReturnAnd it works.
And it works exactly the same as INDEX and MATCH as well.
Okay, two more to do.
XLOOKUP. I'm going to start with a nice, simple XLOOKUP.
One of the problems, as I've said, with VLOOKUP and HLOOKUP, very well known, but a little bit clunky, a little bit straightforward, not very flexible, always have to have the data on the left-hand side of our table of data. XLOOKUP only arrived in Excel 2019 and looks to solve some of those problems. So here we go. Let's go and find-- Let's use XLOOKUP.
And in here, we've got a really simple example.
I'm going to start typing XLOOKUP.
I want to find A134, and that's my lookup.
But you can see XLOOKUP has got lots and lots of other options, lots and lots of other parameters that we can put into the syntax here, for things like what happens if you don't get a match, what sort of search mode have you got, those sort of things.
So lots and lots of flexibility in here. We're going to use it very simply.
So it says XLOOKUP, I want to find A134, and then it says lookup array. Where do you want to find that? I want to find it in that list of data.
And here's the real difference between XLOOKUP and VLOOKUP.
The two areas, the lookup array and the return array, can be in different places. Different places on your spreadsheet.
The return array could be on the left, not on the right.
It could even be on a different sheet somewhere else. Okay? So we're going to look in this product ID list, and then we put a comma, and then we come back.
Our return array is this section here.
They have to be the same size, or in this case, the same number of rows.
I hit Return, and it comes back with a 3.5. So very straightforward.
Pretty similar to HLOOKUP and VLOOKUP, but a little bit more flexible.
Now we're going to sort of really knock your socks off here a little bit.
We're going to use a nested XLOOKUP.
We're going to do two XLOOKUPs, one inside another, just to show you how sophisticated this thing is.
So first thing we're going to do, I'm going to build this up in two stages.
The first thing we're going to do is we're actually going to identify the price column. It's pricing this, and then we'll look to see if it changes when we change it to, say, revenue.
So I'm going to do an XLOOKUP.
I'm going to point to price.
Where do I want to find price? I want to find price in this set of headings here. But when it's found it, where is my return array? Now you'd imagine that what I would do is I'd just sort of do this, a single line, and it would come back with a 3.5.
But if I select the entire block of data for underneath the price and the quantity and the revenue column headings, look what happens.
I hit Return, and it comes back with the entire column, the entire set of data. And even though I've only put in the column the formula wants in the top, it's basically spilled down.
You can see because all of the rest of the data is highlighted with a little box around it. You can see because I've got a gray reference, the formula has gone gray in those underneath items.
So effectively, it's coming back with an array of data.
So I can now use this array of data to perform another XLOOKUP on.
So what I do is I hit F2 for edit. I go back to the beginning of my function, and I say XLOOKUP, and I hit F2 again. So now what I'm going to do is I'm going to select the product ID from that column of data that the first XLOOKUP has already come back with.
So I say I want to find A134.
And where do I want to find it? I want to find it in that list of product IDs. But then I don't need to tell it where my return array is because the second, or in this case, the first XLOOKUP that I did, has generated that array. So all I need is to go, oops.
F2. All I need is to go back right to the end, put a bracket on the end, and it now comes up with that 3.5.
So I've got this nested XLOOKUP. Super complex.
I'm just going to change that just to show you, change it to revenue, and it does the same thing. So really sophisticated.
The fact that it can bring back not just a single item of data, but XLOOKUP can bring back an array of data, really powerful.
And you potentially even could have three of these nested XLOOKUPs and work almost in 3D. Okay. So that's LOOKUPs, and we've got 10 minutes left, so we're about on time.
So next thing I'm going to do is something called INDIRECT. So I'm going to go to the Multiple Sheets tab, and I'm just going to demonstrate this very briefly, a way of extracting data from multiple sheets. So first thing I'm going to do is I'm just going to remind you of how a normal reference works.
So if I basically said, "I want to go," and you can see I've got three tabs here, and I've got a reference here, North for the tab and B7 for the reference.
Now, if I wanted to go to the North tab and find what was in B7, I do it very easily. I hit Equals, I do Control Page Down, and I go to B7, which is that one there.
It's the product ID, and it comes back with A.
Let's have a look at the formula, how it does that.
It's got the sheet name first, then an exclamation mark, and then the cell reference. Now, I'm going to actually just use something called INDIRECT. So I'm going to say =INDIRECT. And what INDIRECT does is it enables you to build up a complex reference, which can include a sheet name, and a cell reference in text, and then INDIRECT will evaluate that text, see where it's pointed to, and go and get the information from there. And that actually gives you a great deal of flexibility.
So let's do this. So you can see I've got North just above. So I'm going to say INDIRECT, and I'm going to go North.
Now I want to lock the row in here, so I hit F4So until I get a dollar in front of the 11. And then I need to build this thing up in text. So this says north, what I now need is an exclamation mark.
And I use the ampersand to join together pieces of text.
So I've got north, and then I need to put an exclamation mark in brackets, sorry, in quotes, in double quotes, and then another ampersand, and now what I need is that B7 over there on the left-hand side.
Again, I want to lock it, but this time I want to lock the column reference. So I hit F4 until I get a dollar in front of the B.
I close my indirect, and basically I get the same answer.
Let me just, just so you can see that. There we are. There we are.
So I've got an indirect formula, and it's coming back with the same answer.
Now the real sort of clever thing here is if I change what's in the drivers to these indirect formulas, then it'll point somewhere else.
So you can see underneath, I'm pointing now at the C7 rather than the B7. So one column to the right in the North sheet, and that's where the data is. So I'm just going to copy that, paste it up here, and now you can see it's pointing somewhere else.
It's coming back with the three, which is in the very top element there. Now here's the sexy bit. If I copy this to the right, then basically what it does is each of those is now pointing to different sheets and picking up the contents of C7 in every one of those three sheets.
Similarly, if I copy it down, then you can see it's basically looking down one row at a time in each of those three sheets.
Now, final thing I'm going to do with this is basically just do a little sort of integrity check, and I'm going to say, "What does that all add up to?" And if it all adds up, if I add it all up, I get 36. Now, I could do a little check on each of these. I could basically go through and add up manually. But you can do sums, sum formulas in three dimensions.
So what I'm going to do here is I'm going to point at C7, and I'm going to look through all three of the sheets.
So I'm going to say equals sum.
Going to go to North, hold down Shift, go to West, hold down Shift, click Return. And I basically I've done a little check here and checked that basically I'm adding up the same numbers, getting to the same answer. But I'm doing a sum calculation in three dimensions, going into the spreadsheet. Now, indirect's incredibly useful, but just a word of warning, it works really well, but you have to maintain absolute consistency between your various sheets. You could see in these very simple three sheets, same data in the same places, in the same cells, and therefore it works.
But if you go to one of your other sheets, if I went to North or to East and inserted a row, the indirect formula doesn't know I've done that, and it doesn't automatically update references.
Okay, my final bit. I've got five minutes or so to go.
What if you get some data and it's okay, but it's the wrong way around? You've got a dump of some data from a database and it's the wrong way around.
How do we transpose that? So we've got a few examples here. We've basically got, excuse me, we've got some original data. It's all hard-coded in, and I've got countries on the left-hand side, and I've got bananas over on the row headings.
So what if I want it the other way around? Well, there are a number of ways of achieving this.
The first thing I can do is just copy the data like this, Control + C, and I go down to where I want to put that data, and I paste it in. But there's an option if I go into Paste Special, so you can use it from the menu or alternatively Control + Alt + V comes up with Paste Special. And you can see over on the bottom right-hand side, there's a box for transpose. Now, I can either click on this or you can also see there's a little line under the E.
So if I click E on my keyboard, it automatically ticks that.
I hit Return, and you can see that what it's done is it's taken that original data and transposed it. So, if I look at the bananas, it goes 3.6, 4.1, and 4.
Now the same numbers, this has shifted the numbers that were going down to going from the left to the right.
There's also a number of other ways of doing this.
There's a transpose function. So I hit equals, transpose, and what I do is I go up to my original data up here and I just highlight that array, I hit Return, and it's nicely transposed all of that.
I'll just put the formula over here, equals formula text, and there it is, transpose. And the great thing about this is if I change one of the pieces of data, so I change the 3.6 to 100, it changes in the transposed data. Whereas this one, it's just copied and pasted the values and there's no live link anymore. Okay, just going to show you one more option here, and it's using a mix of offsets, but also a row and a column counter. Now, I'll just show you what a column is.
So if I say column here, and I just point to this row, it's just pointing to the row underneath.
If I increment this, if I basically copy it to the right, you can see that the column counter increments.So I could actually use this, the column counter, to drive the rows within an offset function. And that mix of driving rows with a column counter and driving columns with a row counter is going to accomplish my transposition. So I'll show you how this works.
I say equals offset, and I'm going to go up to that top left-hand corner of my data. I'm going to lock that with an F4. There I am.
So I stay there. And then basically what I'm going to do now, when it asks for the row, I'm going to say I want the column counter, and I'm going to column count from A1, and that's because A1 starts with column number one and row number one.
And as I gradually copy to the right, it'll go one, two, three, four.
Copy it down, it'll go one, two, three, four.
So I'm going to count the rows, but with an incrementing column counter.
I'm then going to count the columns, and I'm going to use the row counter, A1. And it's this mix of putting the row function in the column counter and the column counter in the row function that's going to accomplish the transposition.
So I hit Return, and I've actually done that wrong.
I should have started there in A4 rather than B5.
Just going to manually change that.
Dollar A, dollar four.
Hit Return. And now if I copy this to the right and copy this down, two, three, four, five, six.
There I am. And you can see that basically I've accomplished that transposition.
I'm sorry about this, I'm making a mess of this. Equals formula text.
I'll just do-- Sorry. I'll just do this just to that one there. There we are.
There I've got my column counter. There I've got things there. Okay.
So I've accomplished this transposition by using a mix of the offset and the rows and the columns, but in the wrong place.
Okay, that's just about my time up, so I'm just about to stop.
I'm just going to mention these two here, offset and match and index and match. Now, what we've got here to drive these two transposition options is we already have the row and the column headings, but they're the opposite direction.
They're the opposite way around to the original data.
So what these basically do is they just do an offset and they do an index, just like we did on the first sheet, but they use a match function to say, if you find the USA as a country within the columns in the first column, then that's what you need to pick up. If you find the UK, you need to pick up the UK row, and basically you've got a match in that offset function within the column and the row.
I'd encourage you to go and have a look at the solution to this.
I think this is another good way of working out how those offsets and index functions work. Phew.
And as if by magic, it's exactly half past. So that's my time up.
So if you haven't done it yet, I'm just going to suggest that-- Let me just get the links. If you haven't done it yet, I'm just going to grab, copy, and paste. I'm just going to grab and copy and paste the web links. And if you haven't done it yet, use the chat box, find the web links, and download both the original empty versions of these files and also the solution files as well. Phew. That's it. Everyone still there? I can see everyone's still there. That's great.
So thanks for sticking with me on this.
Hope that was useful. I know it was a very quick run-through, and hope to see you on one of these sometime very soon. Okay, take care.
Have a great weekend, and hope to see you on one of these soon. Okay, thanks a lot.
Bye bye.