While you can think of the Series as a one-dimensional list of data, pandas' DataFrame is a two (or possibly more) dimensional table of data. You can think of each column in the table as being a Series.
from pandas import DataFrame
There are many ways of creating a DataFrame but if you already have your data in Python then the simplest is by passing in a dictionary:
data = {
"city": ["Paris", "Paris", "Paris", "Paris",
"London", "London", "London", "London",
"Rome", "Rome", "Rome", "Rome"],
"year": [2001, 2008, 2009, 2010,
2001, 2006, 2011, 2015,
2001, 2006, 2009, 2012],
"pop": [2.148, 2.211, 2.234, 2.244,
7.322, 7.657, 8.174, 8.615,
2.547, 2.627, 2.734, 2.627]
}
census = DataFrame(data)
This has created a DataFrame from the dictionary data. The keys of the dictionary will become the column headers and the dictionary values will be the values in each column. As with the Series, an index will be created automatically.
census
Or, if you just want a peek at the data, you can just grab the first few rows with:
census.head(3)
When we accessed elements from a Series object, it would select an element by row. However, by default DataFrames index primarily by column. You can access any column directly:
census["city"]
Accessing a column like this returns a Series which will act in the same way as those we were using earlier which we can see by doing
type(census["city"])
Note that there is one additional part to this output, Name: city. Pandas has remembered that this Series was created from the 'city' column in the DataFrame.
We can start to ask questions of our data in the same way as we did with Series. If we grab a column from the DataFrame and do a comparison operation on it:
census["city"] == "Paris"
This has created a new Series which has True set where the city is Paris and False elsewhere.
We can use filtered Series like this to filter the DataFrame as a whole. census['city'] == 'Paris' has returned a Series containing booleans. Passing it back into census as an indexing operation will use it to filter based on the 'city' column.
census[census["city"] == "Paris"]
You can then carry on and grab another column after that filter:
census[census["city"] == "Paris"]["year"]
If you want to select a row from a DataFrame then you can use the .loc attribute which allows you to pass index values like:
census.loc[2]
census.loc[2]["city"]
New columns can be added to a DataFrame simply by assigning them by index (as you would for a Python dict) and can be deleted with the del keyword in the same way:
census["continental"] = census["city"] != "London"
census
del census["continental"]