Introduction to Data Analysis in Python

In [1]:
from pandas import DataFrame
In [2]:
data = {
    "city": ["Paris", "Paris", "Paris", "Paris",
             "London", "London", "London", "London",
             "Rome", "Rome", "Rome", "Rome"],
    "year": [2001, 2008, 2009, 2010,
             2001, 2006, 2011, 2015,
             2001, 2006, 2009, 2012],
    "pop": [2.148, 2.211, 2.234, 2.244,
            7.322, 7.657, 8.174, 8.615,
            2.547, 2.627, 2.734, 2.627]
}
census = DataFrame(data)

We start by grabbing the year that we care about:

In [3]:
census[census["year"] == 2001]
Out[3]:
city year pop
0 Paris 2001 2.148
4 London 2001 7.322
8 Rome 2001 2.547

We can see that the smallest population was in Paris that year but let's try to extract it using pandas.

First, we get the population data:

In [4]:
pop = census[census["year"] == 2001]["pop"]
pop
Out[4]:
0    2.148
4    7.322
8    2.547
Name: pop, dtype: float64

If we call min on the Series we get back the smallest value:

In [5]:
pop.min()
Out[5]:
2.148

But what we actually want is the index for the row on which the smallest value was found, not the value itself. For this we can use the function idxmin:

In [6]:
pop.idxmin()
Out[6]:
0

We can then take that value and pass it back into the city column to find out which city is on row 0:

In [7]:
census["city"][pop.idxmin()]
Out[7]:
'Paris'

And indeed we see that the answer is Paris