Data Extraction in Excel - Felix Live
- 29:01
Felix Live webinar on Data Extraction in Excel.
Transcript
Welcome to this, uh, live, uh, webinar session. Uh, my name's Phil Sparks, as you can see, hopefully on the screen. And I'm a trainer at, uh, financial Edge. Um, and we've got a session, uh, planned together for, uh, 30 minutes on using Excel tools to extract data. Uh, so the intention, uh, what we're gonna do today is we're gonna look at a number of techniques for pulling out data from Excel. Now, as we all know, hopefully, uh, lots of systems, databases, whether they're HR or financial logistics, uh, will dump data in an Excel format, um, uh, in a great big long list. Um, and then your task often as a sort of an Excel expert as the analyst is to sift through that information and extract information, uh, from that. Uh, and we're gonna use, uh, a number of techniques to do that. I'm just gonna show you things like vlookup, um, match and index and match and offset. Um, also X lookup. And then finally, then just a couple of extra little bits for the last sort of, um, uh, five or six minutes or so on transpose and indirect. So basically, what can we do if we need to look further fields, then our lists, and also what happens if the list arrives in the wrong orientation and the wrong sort of, uh, shape? What can we do to, to, to fix that? What, what can we do to get it into what is normally a long vertical, uh, list? Uh, if you've got, if you've got those six files, um, and downloaded them, uh, what we're gonna look at, we're actually gonna spend our time just looking at workout one. Um, I'm gonna look at quite a lot of workout one. So if you can open up workout one if you want to follow with me. And you can either follow the empty version if you want to actually have a go at completing some of this, uh, along with me or, or alternatively open up the full version and that's got the solutions in. Um, and if you want the solutions, then you can, uh, then you can, you can just follow, if you want to follow through what I'm doing, you can see the solutions as well. And I'll explain how. Um, I'll show you how to actually, uh, get there. And again, I can, where's my chat box? My chat box just gone. I can just see another couple of people have just joined. So again, I'm just gonna pop the URL the link to our, um, website with all of the material, um, into the chat box. So if you want to grab that from that chat box and open up out one, uh, you'll see the same thing on the, on your screen as on mine. So we're gonna dive straight in. We're gonna look at what, um, the first tab we're gonna look at work, look, uh, at, uh, lookups at one. Um, just lemme make that a little bit bigger. Um, excuse me. Um, we're not gonna look at workouts two 'cause it's really pretty much the same. It's more examples, it's more, um, uh, uh, more opportunity for you. Try for you to try out some of that, uh, that some of the techniques we are just gonna look at and work out. One, um, uh, we, we will look towards the end of the file at the, uh, multiple sheets and the transposing tabs, um, as well. So, uh, work out, uh, what lookups one on workout one that's, uh, where you need to be. And I can see just one more person's just to join. So again, I know I keep on popping the, um, uh, the same link in the chat box, but if you've just joined, uh, that link in the chat box will grab you the, um, material that we're looking at. Okay, so we're gonna start right at the top, um, with this, the simplest and oldest of the, uh, the lookup techniques, and that's v lookup, basically V for vertical looks down a list. A, there's a thing called H lookup, which looks, um, horizontally, but V lookup is much more commonly used and has been around in Excel for decades, for at least 20, 25 years, something like that. I'm actually gonna look at the second, uh, workout. First. I'm gonna look at this one that says workout tech, VLOOKUP text. And what we're gonna do is we're gonna try, here's a typical example. It's obviously very small. It's only got five lines as opposed to 50 or 500 or 5,000, but the, the technique is still, uh, valid. So what happens here is we want to find an item, a 1, 3 4 here, um, which is in this long list. And that could be a student number or it could be a products id or, um, it could be, uh, something in, in your financial ledgers. Um, and we want to find the price associated with a 1 34. And we're gonna use a thing called vlookup. So I'm gonna type if you lookup.
And the great thing about Excel is it gives me the syntax for the function at the very top. So you can see it says V lookup lookup value. So this is what we're looking for. So we just point at the line just above the A 1, 3 4, and then it says words, you want to find this. So you're gonna have to then go and, uh, hold down shift, use your arrow keys to select that range, that array. And it's important that the thing you are looking for in the great big long list is in the left hand most column. Uh, that's one of the limitations of, uh, vlookup. So, uh, left hand column is the products id, and then the rest of the array has to cover everything that you want, you might want to be looking for. Then it says, um, which ar array, which column do I want to come back? Well, the price is in the second column, so I just put a number two in here. But that could be, if I had an array that was 10 columns wide, I could choose column number seven and it would just pull what was in that seventh column, uh, to the right hand side. And then the final thing is it says, what sort of match do you want to make? So when I'm looking for a 1 34, do I need an exact match? And if that's the case, I need to say false or zero, which is what I'm gonna do if I want an approximate match. So it's a numerical search, um, and I'll show you how to do this in the first example. Then we make it false, we make it true. Or one I hit, um, close my parentheses and then I get, uh, that number there, 3.5. So you can see it successfully grabbed that three price of 3.5. And just to demonstrate that it really does work, if I change the, uh, product ID to X 2 1 2, then it comes back with 4.8, the third one in the list. Okay, um, downside of this, is it a bit clunky? Um, that column number, we're hard having to hard key in, and it only works if the layout of the table has the thing you're looking for in the left hand, most column and the things that you are looking for, the data about that in the right most columns. Okay, so let's go and have a look at this one here. What about if we're looking for numbers? So we're gonna use exactly the same, uh, structure equals the lookup.
We're looking for 25,000 in this case. Um, but in this case, what we've got is a set of wage bands and the 25,000, it isn't exactly one or the other. So we're gonna set, instead of saying, um, we're going to look for everything and we want an exact match, we still want column two, what we're gonna say is, instead of an exact match, we want an approximate match. Um, so we either hit true or one at the end. So I'm gonna just put a one in here and that gives me 30%. So you can see what's happened is it's look for 25,000, which is somewhere in between these two. So it's defaulted to the first one. It's stuck on 30% until our wage gets to 30,000.
And when I do that, then the, uh, thing, then it looks one line below it grabs the 30,000 and makes the tax rate 34%. So pretty useful. Um, and that, uh, and you can see that, um, approximate, um, uh, uh, match is, is really, really quite, really quite useful. But it doesn't tell you exactly how it's gonna approximate. Um, oh, I can see one more person's just joined. Let me just pop the, um, uh, the URL the link to the site in there. So if you've just joined and you want to, that you want the copies of the, uh, spreadsheets, go to that URL in the chat box. Okay, so what can we do? Well, it's a little bit clunky, it's a little bit dyna. So we can start to make our v lookup a little bit more flexible, a little bit more sophisticated. So if we now have a bunch of data here, um, and we have both price, quantity, and revenue, what we can do is instead of having to key in column two or column three, we can use a match function to basically find which column. The thing we're looking for is in, so I'm gonna say equals match. And the match function basically finds something, in this case it finds price. And where does it find price? It basically finds it in this column here, in this column here. Now I'm gonna basically, oops, I need to actually change that. I need to make that, um, price, sorry, I don't, I need to make it those four there. So I need to make, I need to link it to all of that in all of those headings, including the original product ID column here. Um, and then it says, uh, in my match function, again, what's the option? And it's a bit more sophisticated here. It says less than, greater than or exact. I'm gonna go with an exact, and it comes back with two. It comes up with number two because price is in the second column here, price is in the second column here. So now what I can do is do a V lookup again equals V lookup, and I'm still looking for a 1 34 and I'm gonna define all of my, um, array like this, um, including the product ID in the first column of course. And then instead of saying just key number two, I'm now gonna make it equal to the result of that match function and again, zero for an exact match hit return. And it gives me the same answer as before. But the clever thing, now if I change it to X two, it still changes the price. But now instead of just li being limited to price, I can type in revenue. And you can see two things have happened. First thing, the match function has said the revenue column is now the fourth column along, and then the VLOOKUP is basically grabbing the revenue for that X 2 1 2. So it's starting to be a little bit more sophisticated. Let's keep on going. Now we can use something called an index, um, and we're gonna look at two alternatives here, index and offset. And both of them are ways that traditionally we could perform the same sort of thing as a vlookup, but in a more flexible way. We could go left, we could go right, we can change the way that it looks for things. We can be a little bit more, uh, sophisticated. So I'm gonna build this up slowly. The first thing I'm gonna do is I'm gonna use two match functions, and I'm gonna say equals match. And basically I'm gonna look for the A 1, 3, 4, and I'm looking for that within that list of products. And I want my, um, zero, I want to an exact match. So onto to zero at the end, I hit, uh, return and it says a 1, 3, 4 is the first thing in the list. Then gonna do the same thing with the column headings. I'm gonna say equals match, and I need to find price. And where do I need to find price? I need to find price in that list of headings. Now, I don't need the product ID column now 'cause I'm just gonna point towards the data in, uh, the array, the range of, uh, data. Okay? And again, zero for an exact match, hit return. And um, I then get, uh, number one, um, and let's now use an index function. So I'm gonna say equals index. So what does index do? Index, basically, you can see it points at an array, it points at a range of data, a block of data, and then it, it, it extracts from that block, the intersection of a certain row and a certain column. Now, I've already, I've already used the match function to tell, it's where the intersection is. It's the first row and it's the first column. So all I need to do is to define the array, which is that little block of data there with all of the data in it. And then I need it to be what row, it's the, um, product ID match and what column, it's the price, uh, column. I hit return and I get 3.5 again. Um, but this is starting to be much more sophisticated. So I can say X 2 1 2, and you can see that match function selects the third one down. I could change revenue and then that match function chooses the third column across. So I can get anywhere within this little block. What I can do now is do the same thing with an offset function. Now, what's the difference between an in an index and an offset index points towards a fixed array, a fixed shape, a fixed matrix of, um, of data. And if you point, try and point outside that range, it'll basically come back with an na, it'll come back and say error. You can't find that we're outside our, um, array of data offset. All offset does is point to a single cell and it basically steps right and down or up and down if you use negative numbers from that particular cell. So it's, it's agnostic. It doesn't occur whether you are, um, pointing at your array of data or somewhere else. So in some respects it's more flexible, but it's also easier to make mistakes with this. So, um, different different techniques, uh, di different, different functions for slightly different uses. So what I'm gonna do now though is I'm gonna use the offset function, um, and I'm gonna point towards a particular array, uh, sorry, a particular point. And then this is very common with offset. What I'm gonna do is I'm gonna point to one cell outside the array of data where I need to look at, because when I come back with a match function and it says step one to the right for price two to the right for quantity, uh, or one, one down for a 1 34, then I basically need therefore to start one outside of that array of, uh, data. And that's very common with offset. Um, but then I'm gonna wrap up the match functions within the offsets, which of course creates a slightly more complex formula. Um, but it's all in one place. So here I go, I go offsets, I'm gonna start at C 45. And then how many, how many rows do I want to step down? I'm gonna put a match function, and what am I looking for? I'm looking for a 1 34 within that range of data and I want an exact match. And now how many columns do I want to step ahead, um, step to the right, I need another match function, and I need to say I'm looking for price within that range of, uh, that array of, uh, headings. Um, and again, what sort of match do I need? I need a perfect match. I need an exact match, close parentheses twice. And I come back with, um, uh, 3.5 and it's all now wrapped within one. Um, just one, uh, function. One, one calculation, one function. Uh, and again, just to demonstrate I can change it to revenue and it comes back with four, 3.4 30.5. Okay? So we're getting pretty sophisticated now. Um, now we move on to the real sort of clever stuff where you've, we'll move on to something called X lookup. Now, X lookup is the sort of modern version of V lookup. V lookup, um, been around for 20 years or so. X lookup has only been in the last couple of versions of Excel over the last four or five years. So you need to be a little bit careful of this. Um, depending on which version of Excel you've got. And perhaps more importantly, the person that you are sending your spreadsheet to, you need to make sure that they've got a relatively recent version of X lookup. Um, so that, um, if you, if you're relying on this, uh, working, so let's have a look. First of all, X lookup is actually pretty similar to V lookup and H lookup. The syntax is quite similar. So we're gonna say equals X lookup.
And now it looks pretty similar. So let's look at the syntax. First of all, it says, what am I looking for? A 1, 3, 4? And then it says, where are you looking? Get it, looking for this. And here's the clever bit. Here's where x lookup is different. Basically it says, I'm gonna look for a particular piece of data in a particular array, but then where I basically extract the answer from the other information can be somewhere very different. Could even be on another tab as well. Could be left, could be on another tab. Um, uh, x lookup doesn't care. And also if you look at the end of X lookup, it's got lots of extra optional features. You know, what happens if it doesn't find it, what do you want it to come back with and so on. So I'm gonna say look for a 1 34 within that product ID list. And then where do I want you to return? I want you to find the answer in this array at the side, but of course this would work with V lookup. But what if, if this was on a, on another tab or on the left hand side, only X lookup would work bracket, and I end up with the same sort of, uh, same, uh, solution 3.5. Now this is where X lookup gets really pretty clever. 'cause what happens now is we're going to nest an X lookup inside another X lookup. So, uh, concentrate hard on this. So the first thing we're going to do is I'm gonna just search for price first. And what it's gonna do, what I want it to do is I want it to select a column of data that represents price. Or if I change price to revenue, I want it to come back with this column of data instead. And that's gonna be the inside bit of my X lookup. So I'm gonna start with that. I'm gonna say equals X lookup, and I'm looking for price. And where do I find that? I basically, um, I find that within this range of headings, so it's the first one along, and of course, X lookup fine with it being horizontal or vertical. It's, it's, it's fine with that. And now here's the clever bit, rather than just se uh, selecting another row the same, which would just come back with, uh, say 4.2 in this case. Uh, what I'm gonna do is I'm gonna select a whole range like this. I hit return. What does it do? And here's the clever thing. What it's done is it's basically come back with that whole column, even though I only put the formula in the top. And what it's doing in Excel language is it's spilling the amounts out from the original cell into the cells underneath. And you can see it's put a little sort of highlight box around it. Also, if I go to the formula bar, the very top one where I type the original formula, you can see it's black. If I go into the lower ones, it's gray, and I can't actually delete one of these items. It says you can't delete that. I hit the delete key. Okay? Uh, so what it's got is it's created a range. It's created an array. Now what I'm gonna do, here's the clever bit, is around this array. I'm gonna ca I'm gonna do another X lookup, which is gonna select the right row from this array. Uh, so I'm gonna say equals X lookup, and it would help if I typed it correctly.
And so what I need to do now is I'm gonna look for, um, it's easier to do my mouse with this because I'm editing something. So I'm gonna look for a 1 3 4 within the first column. And then the return array is basically gonna be the X lookup, the first X lookup function I came back with. So the return array is that column of, of, um, information under the price. Let's put out parentheses around the end, see if it works. And it does, it comes up with 3.5. And again, let's just check that it works X 2 1 2 and revenue and it's coming back with the right answers. So that's getting pretty sophisticated and nested X lookup, pretty clever. Hey, um, and as I say, be careful with this, it's, um, it only works on more recent versions of Excel, so you have to be a little bit careful with, um, how you, uh, use this. Um, but it's really very sophisticated and as I said, lots and lots of other, um, alternatives. Lots and lots of options at the end of that X look up formula that says, um, what happens if it doesn't find, um, and so on. Okay, the next thing we're gonna do and just mention, uh, lookups two has got another set of examples. You can have a try. So make sure you've got both the, um, original empty version and also the full version if you want to try out some of these techniques on, uh, some different examples. Now we're gonna go to this bit here, multiple sheets. We're gonna look at multiple sheets now, um, what I've got here in uh, Northeastern West is I've got, um, a various, um, uh, a bit of data and it's got product A, B, C, D and quantity sold. Um, and then in the east region I've got something that looks very similar with these products. Um, and uh, so on, uh, you'll see between these three, uh, tabs, they're all identical in structure. Now I'm just gonna show you very briefly something called indirect, which is really super clever, really, really sophisticated. So here it goes. So the first thing I'm going to do is I'm just basically going to point at one of the cells, and I'm just gonna show you basically how this uh, works. So I'm just gonna do this manual. I'm gonna go to, I want to go in this intersection to North B seven. So I'm gonna go equals I'm gonna use my mouse and I'm just gonna say north, and then I'm gonna click on B seven.
And it comes up with the num, with the letter A. Okay, just put the formula alongside that, but we remember, we know how that works. It says E north equals north exclamation mark B seven. So north is the tab, and B seven is the reference. Now, I'm just gonna do something slightly odd down here. I'm gonna say equals indirect. And then indirect basically takes a piece of text, and then the indirect, indirect function basically converts that text into a reference, and then, uh, basically goes to that destination and comes back with the, with, with, with the content. So I'm just gonna do something odd, I'm just gonna type in quotes, uh, north exclamation B seven quotes, and it comes up with the same, exactly the same answer. Now, you might sort of wonder what on earth the point, what on earth is the point in doing that? Um, so obviously just a more complex way of making a reference, but I'm trying to demonstrate that the indirect function, uh, handles this reference despite it being in text and basically converts it into a real reference from text and then goes and looks. So what I'm gonna do now is I'm gonna say equals indirect, and I'm going to basically construct that reference from the headings and the, the, the column headings and the row headings. So I'm gonna say equals indirect. So the sheet name is in the row just above. So I'm gonna say, uh, reference there C 19. Now as I copy this to the right and copy it down, I want to lock into that row. So I need function F four and I need to lock into dollar 19, row 19. And then to join pieces of text, I use this and or amand symbol. Um, and basically what I need now is an exclamation mark. So I put, um, quotes, exclamation mark quotes, and then another and symbol. And I'm now gonna point at the B seven on the left hand side. Um, and again, I want that to lock, but I want it now to lock with the column. So I'm gonna hit F four until I get a dollar in front of the B. There we are, I hit close bracket, hit return, and now it's basically pointing at exactly the same place. But here's the clever bit. If I basically copy across to the right control R and control D for copying down, then each of these is basically pointing to different tabs. And so that's a really sophisticated way of referencing within Excel. Be careful. It only works if the, um, the sheets that you are referencing to have a totally consistent layout. But if you, for instance, have, um, information about one of your divisions on one tab and a, and the second division on the the next tab and a third division on the next tab, and you want to, uh, some sort of dashboard that looks across at the various divisions, then the indirect formula is gonna enable you to look at different tabs very easily just by basically having some sort of reference, some sort of link to which tab is, uh, you need to go and look at. So really powerful a little bit, you gotta be very careful with this. It doesn't deal with things like inserting rows and columns. It doesn't deal, uh, with tabs that have got slightly different layouts. Um, but if that's the case, if you are dumping data, uh, consistently into Excel, this is a really powerful function. And the final thing that I'm gonna show you is, uh, something about transposing. So we're gonna transpose, uh, data. So what happens if, um, what happens, excuse me, if, um, you basically get some data a little bit like, uh, this and you decide that it's in the wrong order. So what I'd really like is I'd like the countries along the top and I'd like the fruit on the left hand side. And of course, you know, if it's a relatively small amount like this, you could just basically print it out and re-key it, and that would be fine. But of course, um, Excel is much more sophisticated than that. So the first thing we could do is we could do a one time transpose of the data. So I sit there and so I go control C Controle for copy, and then what I want to do below is I want to paste special. So you can either go into the menus and hit paste special, um, up, uh, on the top left hand side or alternatively control Alt V, and that'll give you the paste special dialogue box. So here we go. I'm gonna paste special, and you'll see there's lots of good stuff in here, lots of clever stuff in here. But what we're gonna do is we're gonna click this one down at the bottom, transpose and transpose will automatically just shuffle everything round, turn it all round from columns into rows and vice versa. So here we go. There we are. It's all, uh, being rearranged. Um, now if I go into any of these amounts, you can see that they've been hard coded. So if I go to bananas up here, and instead of making it thir 3.6, I made it 36, that one changes, but the one underneath doesn't. It's just pasted those values in, not linked in any, uh, way.
So there's a clever function within Excel that says transpose. So I'm gonna say equals transpose. And you can see all transpose says is point to an array. So I'm gonna go up to the array up here, including the headings, highlight everything, um, close bracket at the end, hit return, and boom, there it is. It's basically rearranged everything. It's rearranged everything. Now, you might recognize this when we used X lookup, it's got a little highlighted box around it, so it's the same thing. What it's doing is it's saying you put a a, a function in the very top left hand side, and I'm spilling the data into the next, um, into, into the next, um, next columns and the next, uh, rows. So if I go into these other ones, you can basically see that it's a pale gray color. Now, of course, I can link to these, so I can say equals 13.2, and it comes up with that, uh, reference. No problem at all. But what I can't do is I can't delete individual items. I can't sort of do, if I hit delete, nothing happens. And that's because it's treating that entire block of data as a single variable, as a single thing. 'cause, and the second, and the final thing with this is it's also linked to the original data. So if I scroll to the top, I make my bananas, um, uh, instead of 36, let's make it 63, Actually click on the right cell. 63. There it is. So what happens, of course, the one that we paste special values, that doesn't change, but the one down here does change. It's a live link from the original data into this transposed data. So really useful function. Um, there's some great stuff underneath, which takes really quite a long time to get through with, uh, using offsets with rows and columns and matches and stuff like that. Absolutely have a, have a look at, uh, how, how we've done those. But, uh, in reality, I think this transpose function is incredibly useful, very, very valuable. Um, and the fact that it just locks onto the original data doesn't let you amend that I think is actually a really good sort of safety net stopping. So to stop you doing old things, uh, with it. Okay, there we go. Uh, it's if by magic, it's exactly 29 minutes past. Um, so, um, I'm just gonna finish off at that point unless there's any questions in the chat box. Um, hope that's been, uh, useful. Remember, this session is recorded. Um, so if you want to, uh, have a look through this again, if you want to follow it through, then you can stop and start it, follow it through, um, perhaps a little more slowly. Um, that will, uh, that will work. Um, the recording should be there, perhaps not today, but um, um, at some point over the weekend, the recording should be there and definitely by Monday morning, the source of business hours. Um, it'll definitely, uh, be there. Uh, before you go, remember to download the, uh, the, uh, spreadsheets. There are six spreadsheets, um, basically three empty and three full with lots of examples, lots of, um, and both empty versions and solutions for all of those. Um, so you've got lots of chance to, uh, try, uh, some more examples of these. But, uh, we've been through all the techniques really, really quickly. Um, hope that's been useful. Okay, that's me done. Um, look, uh, look, look forward to seeing you on another one of these, uh, very soon. And I hope you have a great, uh, weekend. Okay, thanks a lot. Bye-bye. Take care.