Numpy and Simulations

Numerical Python

The standard package for numerical work in Python is numpy:

In [1]:
import numpy as np

Much like the core concept of pandas was the DataFrame, the core concept of numpy is the array. They are like lists, but more powerful, faster and really designed for numbers.

Creating arrays

There are lots of ways of creating arrays, but the simplest is to pass in an existing Python list:

In [2]:
my_list = [1, 2, 3]
my_array = np.array(my_list)

or you can pass it in directly:

In [3]:
my_array = np.array([1, 2, 3])

This gives us an object, my_array, which you can display:

In [4]:
my_array
Out[4]:
array([1, 2, 3])

Or, if you're working in a .py script, you can print them (note the slightly different output format):

In [5]:
print(my_array)
[1 2 3]

You can access the items in an array in much the same way as in a list. i.e. to select the first element:

In [6]:
my_array[0]
Out[6]:
1

To select the last element:

In [7]:
my_array[-1]
Out[7]:
3

To select everything from index 1 to the end:

In [8]:
my_array[1:]
Out[8]:
array([2, 3])

Multiple dimensions

One of the very powerful features of numpy arrays is that they are good at handling multi-dimensional data.

If you pass in a list-of-lists (of equal lengths) it will create a two-dimensional array:

In [9]:
grid = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(grid)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

You can ask any array for how many dimensions it has:

In [10]:
grid.ndim
Out[10]:
2

Or get more detail on exactly how large it is in each dimension with:

In [11]:
grid.shape
Out[11]:
(3, 3)

If you access a multi-dimensional array using a single number in the square brackets then it will give you the data by row. For example, to pull out the first row:

In [12]:
grid[0]
Out[12]:
array([1, 2, 3])

or the second row:

In [13]:
grid[1]
Out[13]:
array([4, 5, 6])

Then you can get individual elements by specifying multiple indices inside the square brackets. For example, to get the first row and the second column:

In [14]:
grid[0, 1]
Out[14]:
2

This is useful as it also allows you to select the data by column using : to mean "all":

In [15]:
grid[:, 1]
Out[15]:
array([2, 5, 8])

Exercise 1

  • Extract the last column (the answer should be array([3, 6, 9]))

Even more dimensions

numpy arrays can handle any number of dimensions. So you can make a three-dimensional cube of numbers with:

In [16]:
cube = np.array([[[1,2], [3, 4]], [[5, 6], [7, 8]]])
print(cube)
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

Notice how the array is represented when it's printed. Each "row" of the cube is now a 2-D sub-array, separated by an blank line.

Exercise 2

  1. Find the number of dimensions and shape of the array cube.
  2. Extract some slices of the array, taking care to understand how the order of the indices relate to the data inside.

Setting values

So far you have only defined the data in the array by passing in the data as a list at creation-time. numpy arrays are mutable so you can edit the values within.

So, going back to our grid from before:

In [17]:
print(grid)
[[1 2 3]
 [4 5 6]
 [7 8 9]]

you can set the value of any cell by indexing it with [] on the left-hand side of the = and passing a value:

In [18]:
grid[0, 0] = 999
print(grid)
[[999   2   3]
 [  4   5   6]
 [  7   8   9]]

You can see that the top-left corner cell has changed from 1 to 999.

In a similar way, you can assign entire slices of the array to a single value:

In [19]:
grid[:, 1] = 10
print(grid)
[[999  10   3]
 [  4  10   6]
 [  7  10   9]]

or set a list of values:

In [20]:
grid[:, 2] = [-1, -2, -3]
print(grid)
[[999  10  -1]
 [  4  10  -2]
 [  7  10  -3]]

This works even to set the entire array:

In [21]:
grid[:] = 42
print(grid)
[[42 42 42]
 [42 42 42]
 [42 42 42]]

Data types

One important way in which numpy arrays differ from Python lists is that each array can only hold one "type" of data.

The main reason for this is because it's how numpy is able to perform calculations so quickly. If it knows in advance that all the items in an array are, for example, integers, then it can make some assumptions which make anything you do to it faster.

By default it will infer the type from the data you pass in, so in our case because you passed in a list of integers, the data type (or dtype) of the array is:

In [22]:
grid.dtype
Out[22]:
dtype('int64')

This is a 64-bit integer. On your computer you might get int32 instead.

When you create the array you can specify the dtype:

In [23]:
my_int_array = np.array([1, 2, 3], dtype=int)
my_int_array.dtype
Out[23]:
dtype('int64')
In [24]:
my_float_array = np.array([1, 2, 3], dtype=float)
my_float_array.dtype
Out[24]:
dtype('float64')

Note that even though you passed in integer values when creating my_float_array, when you print the array, it shows them with a decimal point:

In [25]:
print(my_float_array)
[1. 2. 3.]

Also note that if you have an array with an integer dtype, then if you try to set a value as a float, it will drop the decimal places:

In [26]:
my_int_array[1] = 47.769
print(my_int_array)
[ 1 47  3]

Creating pre-filled arrays

It's common to want to create arrays of a cetain size with all the values set to something specific.

For example, to create a three-item array filled entirely with 0:

In [27]:
np.zeros(3)
Out[27]:
array([0., 0., 0.])

Or a four-item array filled with 1:

In [28]:
np.ones(4)
Out[28]:
array([1., 1., 1., 1.])

With most of these array creation functions, you can specify the shape of the array with a tuple. For example, to create a $3\times5$ array full of zeros, you can do:

In [29]:
np.zeros((3, 5))
Out[29]:
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Note that these functions will default to creating arrays with the float dtype, so if you want them to store integers, you must specify it:

In [30]:
np.zeros((3, 5), dtype=int)
Out[30]:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

Exercise 3

  • Create an array with five rows and six columns, with all cells containing the integer 0
  • Set all the values in the second row to 1
  • Then set all the values in the first column to 2
  • Then set the 4 cells in the bottom-right corner to 3

At the end, the array should look like:

[[2 0 0 0 0 0]
 [2 1 1 1 1 1]
 [2 0 0 0 0 0]
 [2 0 0 0 3 3]
 [2 0 0 0 3 3]]