One of the most powerful features of NumPy is its ability to manipulate entire arrays of numbers in one go.
In Python, you can multiply a single number by another to get a new number:
single_number = 3.14
single_number * 2
However, if you try to multiply a list by a number it will give a perhaps strange result:
python_list = [3.14, 2.71, 1.18]
python_list * 2
This is hapenning because Python's lists are not restricted to only hold numbers, nor must they only hold one consistent type, and so they do not have any special logic to account for the case where they do only have numbers in them. The only safe way to interpret *
that works for all Python lists is "duplicate the array".
NumPy, however, is designed to deal with numerical data and so interprets the request differently:
import numpy as np
numpy_array = np.array([3.14, 2.71, 1.18])
numpy_array * 2
3.14 |
2.71 |
2.36 |
6.28 |
5.42 |
2.36 |
Here, each number has been multiplied by 2 individually.
You can perform any standard numerical operations to NumPy arrays, including *
, +
, /
, -
and **
. You can also use comparison operations like ==
, >
and <=
. If your array contains booleans (True
/False
) the you can also use the binary logic operations such as |
("or") and &
("and") as well as the unary logical operator ~
("not").
In all of these cases, it will apply the operation to each element of the array indivudually and give you back an array of the same size.
One big benefit of this is an improvment in speed. To demonstrate this, let's try doubling all the values in a large list of 1 million values:
large_python_list = list(range(1_000_000))
large_numpy_array = np.arange(1_000_000)
Doing this with plain Python could be done with a list comprehension:
%%timeit
[i*2 for i in large_python_list]
But NumPy allows us to do:
%%timeit
large_numpy_array * 2
You might see different results on your computer but speedups of anything from 10 to 100 times is common on an example like this. There are plenty of operations which might see speedups of 1000 times or more.
As well as simple numerical operations, you will often also want to perform more complex operations on your data. For example, the cosine of a number. We can do this in plain Python with the math
module:
import math
math.cos(single_number)
This works, but has the same problem as above in that it doesn't work as you want with a Python list:
math.cos(python_list)
To help with this, NumPy provides a large number of operations via the numpy
namespace. They work the same way as the Python functions for single numbers:
np.cos(single_number)
But they also work with Python lists:
np.cos(python_list)
You see here that even though we passed it a Python list, it has returned the result as a NumPy array. We can also pass in a NumPy array directly:
np.cos(numpy_array)
3.14 |
2.71 |
2.36 |
-0.999 |
-0.908 |
0.381 |
There is a cost to passing in Python lists compared with using an array directly, as it has to convert it from one to the other. If you can, it's best to keep things as NumPy arrays throughout your computations.
The ability for NumPy functions to work on plain numbers as well as NumPy arrays allows us to write code which works for both single values, as well as arrays of numbers. This avoids the need for type checks and makes our code more expressive.
Imagine we have a function, poly
as part of our code which does some maths to its input, e.g.
def poly(a):
return a * 4 - a ** 4
We can of course call this function with a single number and it give the result:
poly(single_number)
Staying in the world of pure Python, we can try to pass a list, but of course it doesn't work as lists cannot be raised to a power:
poly(python_list)
If we pass an array it works since NumPy arrays can apply operations to their elements automatically:
poly(numpy_array)
The key thing here is that one function can work on lots of different types of data. If you write your code to be able to deal with a single number, then NumPy will automatically make it able to do that same calculation to a whole load of numbers.
You do need to make sure that the code you write in your functions can work with NumPy arrays through. So you should use the NumPy functions like np.cos
rather than math.cos
. So to do:
you should write:
def trig(a):
return np.sin(a) - np.cos(a)
which works with single numbers:
trig(single_number)
and Python lists:
trig(python_list)
and, of course, NumPy arrays:
trig(numpy_array)
As our arrays get longer and more complex, it's difficult to see what's going on by just looking at the numbers. Let's see how we can easily plot the data as a line graph. Let's make our data to be plotted:
# Numbers from 0 to 20. 100 of them.
x = np.linspace(0, 20, 100)
y = trig(x)
First, we need to import matplotlib
, the defacto standard plotting tool for Python:
import matplotlib.pyplot as plt
Then, we need to make a place for the plotting to happen which we do with the plt.subplots()
function. This returns two things, a Figure
(the whole page, which may contain multiple plots) and and Axes
(the space in which we will plot).
We then draw on the axes with ax.plot
and pass it the $y$ values:
fig, ax = plt.subplots()
ax.plot(y)
It has done the plot and the the $y$ values are correct, but the $x$ axis has just been taken as the integer indexes of the array. If we want to label the $x$ axis then we can pass two arguments to plot
:
fig, ax = plt.subplots()
ax.plot(x, y)
If you have more complex data than this and are wanting to plot multiple traces over the top with a legend and axes labels, then it's a sign that you might be better off using pandas
for your analysis.
There is a NumPy data file at the URL https://milliams.com/courses/intro_numpy/weather_data.npz which you should download into your current folder. You can do this either by clicking that link and downloading the file via your browser (make sure to copy it to the directory alongside your notebooks or scripts), or by running the following code in a Notebook cell:
import urllib.request
urllib.request.urlretrieve("https://milliams.com/courses/intro_numpy/weather_data.npz", "weather_data.npz")
You can then open the file (which contains multiple arrays, we just want the one called "rain_history"
for now) with the following code which will give you a NumPy array called data
:
with np.load("weather_data.npz") as weather:
data = weather["rain_history"]
data
array