The standard package for numerical work in Python is numpy
. For more information see the full course Introduction to NumPy.
If you have NumPy installed, you can import it with:
import numpy as np
Much like the core concept of pandas
was the DataFrame
, the core concept of numpy
is the array
. They are like lists, but more powerful, faster and really designed for numbers.
array
s¶There are lots of ways of creating arrays, but the simplest is to pass in an existing Python list:
my_list = [1, 2, 3]
my_array = np.array(my_list)
or you can pass it in directly:
my_array = np.array([1, 2, 3])
This gives us an object, my_array
, which you can display:
my_array
Or, if you're working in a .py
script, you can print them (note the slightly different output format):
print(my_array)
You can access the items in an array in much the same way as in a list
. i.e. to select the first element:
my_array[0]
To select the last element:
my_array[-1]
To select everything from index 1
to the end:
my_array[1:]
One of the very powerful features of numpy arrays is that they are good at handling multi-dimensional data.
If you pass in a list-of-lists (of equal lengths) it will create a two-dimensional array:
grid = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(grid)
You can ask any array
for how many dimensions it has:
grid.ndim
Or get more detail on exactly how large it is in each dimension with:
grid.shape
If you access a multi-dimensional array using a single number in the square brackets then it will give you the data by row. For example, to pull out the first row:
grid[0]
or the second row:
grid[1]
Then you can get individual elements by specifying multiple indices inside the square brackets. For example, to get the first row and the second column:
grid[0, 1]
This is useful as it also allows you to select the data by column using :
to mean "all":
grid[:, 1]
array([3, 6, 9])
)numpy arrays can handle any number of dimensions. So you can make a three-dimensional cube of numbers with:
cube = np.array([[[1,2], [3, 4]], [[5, 6], [7, 8]]])
print(cube)
Notice how the array is represented when it's printed. Each "row" of the cube is now a 2-D sub-array, separated by an blank line.
cube
.So far you have only defined the data in the array by passing in the data as a list at creation-time. numpy arrays are mutable so you can edit the values within.
So, going back to our grid
from before:
print(grid)
you can set the value of any cell by indexing it with []
on the left-hand side of the =
and passing a value:
grid[0, 0] = 999
print(grid)
You can see that the top-left corner cell has changed from 1
to 999
.
In a similar way, you can assign entire slices of the array to a single value:
grid[:, 1] = 10
print(grid)
or set a list of values:
grid[:, 2] = [-1, -2, -3]
print(grid)
This works even to set the entire array:
grid[:] = 42
print(grid)
One important way in which numpy arrays differ from Python lists is that each array can only hold one "type" of data.
The main reason for this is because it's how numpy is able to perform calculations so quickly. If it knows in advance that all the items in an array are, for example, integers, then it can make some assumptions which make anything you do to it faster.
By default it will infer the type from the data you pass in, so in our case because you passed in a list of integers, the data type (or dtype
) of the array is:
grid.dtype
This is a 64-bit integer. On your computer you might get int32
instead.
When you create the array you can specify the dtype:
my_int_array = np.array([1, 2, 3], dtype=int)
my_int_array.dtype
my_float_array = np.array([1, 2, 3], dtype=float)
my_float_array.dtype
Note that even though you passed in integer values when creating my_float_array
, when you print the array, it shows them with a decimal point:
print(my_float_array)
Also note that if you have an array with an integer dtype, then if you try to set a value as a float, it will drop the decimal places:
my_int_array[1] = 47.769
print(my_int_array)
It's common to want to create arrays of a certain size with all the values set to something specific.
For example, to create a three-item array filled entirely with 0
:
np.zeros(3)
Or a four-item array filled with 1
:
np.ones(4)
With most of these array creation functions, you can specify the shape of the array with a tuple. For example, to create a $3\times5$ array full of zeros, you can do:
np.zeros((3, 5))
Note that these functions will default to creating arrays with the float
dtype, so if you want them to store integers, you must specify it:
np.zeros((3, 5), dtype=int)
0
1
2
3
At the end, the array should look like:
[[2 0 0 0 0 0]
[2 1 1 1 1 1]
[2 0 0 0 0 0]
[2 0 0 0 3 3]
[2 0 0 0 3 3]]