Legacy – Options when Loading a Dataset Workout
- 02:08
Workout demonstrating the different options available when loading a dataset into Python using the Pandas Library and the PyCharm IDE.
Transcript
When importing the data set into Python, there are two other important aspects to it involved. The first thing is the header. The header is the title of each column in the data set. So for example, for the data s&p 500 companies, if we print the data set we can see that the header is the name, the x first, the x second day, and so on, which is technically the row zero. So if we select header equals zero, this would essentially take the first row as the header the first row in the CSV file. If we change the header to one, it's going to take the second row. As you can remember, Python indexing starts from zero rather than a one. So a zero would mean the first row a one would mean the second row. And again, that's the same case with all the other ones. If we say 100 equals two, it takes the third row as the header the title of each column. Now, the default is header equals zero, which is perfect. Another important aspect of importing the data set into Python using the pandas Library is the index column. By default, it indexes the roles as 0, 1, 2, and so on. If we select index columns as zero, it's going to take the first column as the index for the rows. Similarly, if we take it as one, it's going to take the second column as the index for the rules. Generally, when importing a data set, the ideal thing is to take the first column as the index for the rows. And after printing it, it's going to print the output as follows. Here we've taken the header as zero, meaning the header is the first row and the index for the rows are the first column which is really ideal.