Importing Data
- 02:49
Learn how to create data frames in Python using pandas by importing data from Excel files.
Downloads
Transcript
Luckily, it's not necessary to create dataframes by hand. You can actually import Excel csv and other types of files to automatically populate a dataframe. Let's say that I'm starting from this Excel table and it's saved under this file path on my computer.
I can create a new data frame directly from the Excel file. So here I've named it Excel data frame and I've set it equal to pandas dot read Excel. So this read Excel function is what Pandas uses to go into an Excel file and extract data and then convert it into a data frame in your Python code. And the way that you do that is you take your file path and you make a string by putting quotes around it. However, when I execute this, I'm gonna get an issue. Why do you think I'm getting this problem? The answer is because my file path contains back slashes. And if you remember back to the lesson where we talked about strings, if I wanna print for example, it's possessive and it's, it is, I run into an issue, right? Because this single quote ends my string. So now I have this hanging s and if I execute that, I'm gonna get an error. So the way around that is that I put a backslash inside of my string and that tells Python, take this next quote, literally not as an end to my string, and then I can put in another quote after that, and then my string will print correctly. Python is interpreting the file path in exactly that same way. It's seeing these backsplashes and it's being told to use a secondary value for the U, a secondary value for the Z and so on. And that doesn't make sense, so you get an error. There are two ways around this. First of all, you can backslash your backslash. So I put in a backslash before each of these, and now what I'm doing is I'm telling Python, take this backslash literally. So I'm, I'm literally back slashing my black backslash and saying, don't interpret this backslash as a need to find a secondary value for this U. This is a literal backslash from a file path, and I want you to take it literally and not do any crazy function with it.
This is the method that I like to use. I think it's easiest and most obvious, but the secondary solution is you replace all of your back slashes with forward slashes just like this. Either of these methods will work equally well. So now my data frame has been created, and when I call it again, Excel data frame, it displays just as if I had done it by hand.