Pandas Introduction
- 02:14
What Pandas is and how it is used to create and manipulate data frames to handle large datasets efficiently.
Downloads
Transcript
Like NumPy, Pandas is an external package that you import into your Python code to provide new capabilities. Pandas is useful for exploring, summarizing and editing large data sets, and it is a tool that you're going to constantly use in your machine learning projects. In the first cell of your Jupyter Notebook, make sure you've imported the Pandas package just like NumPy, where we use the alias np, it's common to use the alias pd for pandas. The pandas package provides a new type of object that will be indispensable in your machine learning work. It's called the data frame. Data frames are just like Excel spreadsheets, except they're optimized for large amounts of data and can communicate directly with your machine learning algorithms. You can create a new data frame by passing a dictionary of lists into the data frame function. Each dictionary key will become a column header. Be careful here, functions are case sensitive, so calling the data frame function requires you to use a capital D and a capital F. And also remember that you have to point back to the pandas package in order to use that data frame function. You can't use it just by itself. So here I'm creating this small manual dataframe, and I'm doing that by first pointing toward my alias for pandas, and then calling the dataframe function with a capital D and a capital F, and open parentheses. And inside those parentheses, I'm creating a new dictionary. So here's my open curly brace, and then my key is current price. And the value is a list of current prices. The next key is name, and then it's followed by a list of names of these companies. And then the final key is ticker, followed by the value, which is a list of the tickers of these companies. So I'm gonna close my curly brace to finish that dictionary and then close the parentheses to close that dataframe function. And then when I call the new dataframe I've created, you can see it comes out in this nice format that looks a lot like Excel.