While you can think of the Series
as a one-dimensional list of data, pandas' DataFrame
is a two (or possibly more) dimensional table of data. You can think of each column in the table as being a Series
.
from pandas import DataFrame
There are many ways of creating a DataFrame
but if you already have your data in Python then the simplest is by passing in a dictionary:
data = {
"city": ["Paris", "Paris", "Paris", "Paris",
"London", "London", "London", "London",
"Rome", "Rome", "Rome", "Rome"],
"year": [2001, 2008, 2009, 2010,
2001, 2006, 2011, 2015,
2001, 2006, 2009, 2012],
"pop": [2.148, 2.211, 2.234, 2.244,
7.322, 7.657, 8.174, 8.615,
2.547, 2.627, 2.734, 2.627]
}
census = DataFrame(data)
This has created a DataFrame
from the dictionary data
. The keys of the dictionary will become the column headers and the dictionary values will be the values in each column. As with the Series
, an index will be created automatically.
census
Or, if you just want a peek at the data, you can just grab the first few rows with:
census.head(3)
When we accessed elements from a Series
object, it would select an element by row. However, by default DataFrame
s index primarily by column. You can access any column directly:
census["city"]
Accessing a column like this returns a Series
which will act in the same way as those we were using earlier which we can see by doing
type(census["city"])
Note that there is one additional part to this output, Name: city
. Pandas has remembered that this Series
was created from the 'city'
column in the DataFrame
.
We can start to ask questions of our data in the same way as we did with Series
. If we grab a column from the DataFrame
and do a comparison operation on it:
census["city"] == "Paris"
This has created a new Series
which has True
set where the city is Paris and False
elsewhere.
We can use filtered Series
like this to filter the DataFrame
as a whole. census['city'] == 'Paris'
has returned a Series
containing booleans. Passing it back into census
as an indexing operation will use it to filter based on the 'city'
column.
census[census["city"] == "Paris"]
You can then carry on and grab another column after that filter:
census[census["city"] == "Paris"]["year"]
If you want to select a row from a DataFrame
then you can use the .loc
attribute which allows you to pass index values like:
census.loc[2]
census.loc[2]["city"]
New columns can be added to a DataFrame
simply by assigning them by index (as you would for a Python dict
) and can be deleted with the del
keyword in the same way:
census["continental"] = census["city"] != "London"
census
del census["continental"]