Introduction to Data Analysis in Python

Improving the look of a plot

This aside will show how we can go about improving the visuals of this graph. This will use some of the topics that we will be covering in later chapters, so you might want to come back to this aside once you've been through the material in the visualisation chapter.

In [1]:
import pandas as pd

city_pop_file = "https://milliams.com/courses/data_analysis_python/city_pop.csv"
census = pd.read_csv(
    city_pop_file,
    skiprows=5,
    sep=";",
    na_values="-1",
    index_col="year",
)
census
Out[1]:
London Paris Rome
year
2001 7.322 NaN 2.547
2006 7.652 2.180 2.627
2008 NaN 2.211 2.720
2009 NaN 2.234 2.734
2011 8.174 2.250 2.760
2012 8.293 2.244 2.627
2015 8.615 2.210 NaN
2019 NaN NaN NaN

The simplest thing you can do is plot the graph with no additional options:

In [2]:
census.plot()
Out[2]:
<Axes: xlabel='year'>

The label on the x-axis is taken directly from the column name that we made into the index, "year". Let's make it have a capital letter at the start by passing the xlabel argument to plot:

In [3]:
census.plot(
    xlabel="Year",
)
Out[3]:
<Axes: xlabel='Year'>

And then also set a y-axis label in a similar way:

In [4]:
census.plot(
    xlabel="Year",
    ylabel="Population (millions)",
)
Out[4]:
<Axes: xlabel='Year', ylabel='Population (millions)'>

The y-axis currently starts around 2 which makes the difference between London and the other cities look greater than it actually is. It's usually a good idea to set your y-axis to start at zero. We can pass a tuple (0, None) to the ylim argument which tells the y-axis to start at 0 and the None tells it to use the default scale for the upper bound:

In [5]:
census.plot(
    xlabel="Year",
    ylabel="Population (millions)",
    ylim=(0, None),
)
Out[5]:
<Axes: xlabel='Year', ylabel='Population (millions)'>

This is now a perfectly functional graph. All we might want to do now is to play with the aesthetics a little. Using seaborn we can use their theme which can use nicer fonts and colours:

In [6]:
import seaborn as sns

sns.set_theme()

census.plot(
    xlabel="Year",
    ylabel="Population (millions)",
    ylim=(0, None),
)
Out[6]:
<Axes: xlabel='Year', ylabel='Population (millions)'>

If we want a white background again, we can specify the seaborn style with sns.set_style:

In [7]:
sns.set_style("white")

census.plot(
    xlabel="Year",
    ylabel="Population (millions)",
    ylim=(0, None),
)
Out[7]:
<Axes: xlabel='Year', ylabel='Population (millions)'>

Or, if we want, we can use seaborn directly as the plotting tool using seaborn's sns.relplot:

In [8]:
sns.relplot(data=census, kind="line").set(
    xlabel="Year",
    ylabel="Population (millions)",
    ylim=(0, None),
)
Out[8]:
<seaborn.axisgrid.FacetGrid at 0x7f4158faf050>