Numpy and Simulations

Simulation

When we want to model real-life systems on a computer, or in fact on paper, there are two main ways to do so. To explain how they compare, let's look at an example.

A swinging pendulum (when we simplify it down) has two variables defining it: the angle at which it is hanging, $\theta$, and the speed at which it is moving, $\omega$:

Diagram of simple gravity pendulum, an ideal model of a pendulum. It consists of a massive bob suspended by a weightless rod from a frictionless pivot, without air friction. The angle between the vertical and the weightless rod is given as theta.

If we want to do a calculation to work out what will happen when it swings, we first need to define some fixed values: the mass of the weight, the length of the rod, the strength of gravity. We can then use some equations from physics to work out, at any one time, what the forces are on the pendulum.

Once we have these equations we can either:

  1. do some maths and solve some equations to calculate a general solution to the problem (e.g.) $\theta(t) = A cos(\omega t - \phi)$ (this process usually involves "integration"), or
  2. use the equations to work out what the forces are on the weight at the beginning, use that to work out what it will be in a 100th of a second, then jump forward by a 100th of a second and do that again. Repeating this until we've found our answer.

The first technique gives us an "analytical solution" (because we "analysed" it) and the second is a "numerical solution" (because we just crunched the numbers).

If you're really good at maths then the first can look appealing because it give a more precise and general answer. However, there are problems out there that are either very complex to solve, or even impossible, analytically. This is where numerical computer simulations come in.

If you're simulating something that happens very quickly (like atoms wobbling around) then you'd want the amount you jump forwards each time (called the "time step") to be very small, perhaps femtoseconds, but if you're simulating the planets orbiting, then hours or days would be good enough.

We will ignore the analytical methods as this is not a maths course, and focus instead on ways to make the computer do the work for us.

Numerical simulation

The general equation for a numerical simulation is:

$$\mathrm{state}_t = rules(\mathrm{state}_{t-1})$$

which is saying that each time step we work out what the new "state" of the system is, based only on the state in the previous time step and the rules or equations for our system. An equation like this which defines an item in a sequence based on previous items is called a "recurrence relation" (a famous one is the Fibonacci sequence).

The state of the system is all the numbers that describe it at any one time (for example, the angle and speed of the pendulum). The rules are whatever we know from physics, biology, chemistry, sociology, epidemiology etc. that are appropriate for what we want to discover. In the case of the pendulum these are Newton's laws of motion.

To investigate all this, we're going to go much simpler and look at "exponential decay". This is a nice example as it only has one number in its state, and only one rule to update it.

In the case of an exponential decay the state value is single number, $v$ and each time step the rule is that the value should change by a factor, $\lambda$, also called the "decay rate". This means that our rules function can be defined as (for simplicity we're assuming a time step of 1 second and a $\lambda$ in the appropriate units):

$$rules : v \mapsto \lambda v$$

and we can take our general recurrence equation from above and specify it as:

$$v_t = rules(v_{t-1})$$

i.e. the next value is based on applying the rules to the previous value.

So if the decay rate is $0.9$ and we start the simulation with a value ($v_0$) of $100$ then after one time step (at $t=1$) the value will be:

$$ \begin{align*} v_1 &= rules(v_0) \\ &= \lambda v_0 \\ &= 0.9 \times 100 \\ &= 90 \end{align*} $$

The second time step, $v_2$ is only dependant on $v_1$, so we get:

$$ \begin{align*} v_2 &= rules(v_1) \\ &= \lambda v_1 \\ &= 0.9 \times 90 \\ &= 81 \end{align*} $$

We could carry on doing this forever but I think that's probably enough maths for today, so let's jump into code and do those same operations:

In [1]:
INITIAL_VALUE = 100  # Variable names in capital letters are
DECAY_RATE = 0.9     # used to denote global constants
In [2]:
v = INITIAL_VALUE

We can then apply our rules by multiplying the value by the decay rate repeatedly:

In [3]:
v *= DECAY_RATE
v
Out[3]:
90.0
In [4]:
v *= DECAY_RATE
v
Out[4]:
81.0
In [5]:
v *= DECAY_RATE
v
Out[5]:
72.9

So we're getting the value of v decreasing over time.

Automating the algorithm

Running this simulation by copying and pasting the code over and over isn't going to get us very far, so let's automate it.

In general, when you get a recurrence relation then you can represent it in code with a loop:

decay.py
INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

v = INITIAL_VALUE

for t in range(NUMBER_OF_STEPS):
    v *= DECAY_RATE
    print(v)
$
python decay.py
90.0
81.0
72.9
65.61000000000001
59.049000000000014

This seems to be working. The issue is that often with a simulation, you want to have a record of the values that the state had along the way. Perhaps because you want to graph it, or do some further analysis. Let's throw our values into a list as we calculate them:

decay.py
INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

v = INITIAL_VALUE

value_history = []  # make an empty list

for t in range(NUMBER_OF_STEPS):
    v *= DECAY_RATE
    value_history.append(v)  # append values as we go

print(value_history)
$
python decay.py
[90.0, 81.0, 72.9, 65.61000000000001, 59.049000000000014]

This is good, but we also want to include our starting value in the result for completeness so let's put that into the list before we start the loop and then start our loop with the the second value:

decay.py
INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

v = INITIAL_VALUE

value_history = []
value_history.append(v)  # append the inital value

for t in range(1, NUMBER_OF_STEPS):  # Start from 1 instead of 0
    v *= DECAY_RATE
    value_history.append(v)

print(value_history)
$
python decay.py
[100, 90.0, 81.0, 72.9, 65.61000000000001]

At this point, we are using the variable v to mean a lot of different things. The first time around the loop it means $v_0$ and then we update it with *= at which point it is representing $v_1$. The next time around it is $v_2$, then $v_3$ etc.

This is potentially a little confusing and makes our code harder to relate to our equations above. Let's instead get rid of this generic v and instead be explicit at each time step what variable is what.

From our recurrence relation, $v_t = rules(v_{t-1})$, let's define:

  • $v_{t-1}$ as previous_v
  • $v_t$ as new_v
decay.py
INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

# No longer define `v`
value_history = []
value_history.append(INITIAL_VALUE)  # Append the initial value directly into our list

for t in range(1, NUMBER_OF_STEPS):
    # Instead of keeping the variable `v` between loops,
    # explicitly grab the last state from the history.
    previous_v = value_history[t-1]
    new_v = previous_v * DECAY_RATE
    value_history.append(new_v)
    
print(value_history)
$
python decay.py
[100, 90.0, 81.0, 72.9, 65.61000000000001]

Tidying things up

We now have a decent simulation that conceptually maps to our recurrence relation and rules into code.

However, while using a Python list to store our data works fine with a simple simulation with only a few time steps, if we want to be able to simulate larger systems with either a more complex state or thousands of time steps, we will do better to use numpy.

A big difference between a numpy array and a Python list is that lists can change size dynamically by appending to them, whereas a numpy array is fixed in size when it is made. In our case, we know in advance how long the array needs to be because we know how many time steps we will run for.

If you want to check how much memory a numpy array is using (e.g. to submit to a HPC system), you can use:

In [6]:
import numpy as np

my_large_array = np.ones((1000, 1000))

print(f"Array uses {my_large_array.nbytes/1E6:,.0f} MB of memory")
Array uses 8 MB of memory

So let's update our code to make a numpy list, initially full of zeros. We can then set values in the array directly by index rather than appending:

decay.py
import numpy as np

INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

# Create a container for all our data
value_history = np.zeros(NUMBER_OF_STEPS)
# Set the initial state
value_history[0] = INITIAL_VALUE  # set directly instead of appending

for t in range(1, NUMBER_OF_STEPS):
    previous_v = value_history[t-1]
    new_v = previous_v * DECAY_RATE
    value_history[t] = new_v  # set directly instead of appending

print(value_history)
$
python decay.py
[100.    90.    81.    72.9   65.61]

This is now looking even more like our equations from above, where value_history[t-1] matches $v_{t-1}$ and value_history[t] matches $v_t$.

In our original maths we defined our rules for updating the state of the system as the mathematical function $rules(s)$. To align more closely, we can do similar in our Python code by using a function. I like function names to be verbs (or verb phrases) where possible so I'll call the function update_state.

decay.py
import numpy as np

def update_state(old_state):  # Make this new function
    DECAY_RATE = 0.9
    new_state = old_state * DECAY_RATE
    return new_state

INITIAL_VALUE = 100
DECAY_RATE = 0.9

NUMBER_OF_STEPS = 5

value_history = np.zeros(NUMBER_OF_STEPS)
value_history[0] = INITIAL_VALUE

for t in range(1, NUMBER_OF_STEPS):
    previous_v = value_history[t-1]
    new_v = update_state(previous_v)  # and call it here
    value_history[t] = new_v

print(value_history)
$
python decay.py
[100.    90.    81.    72.9   65.61]

We have now managed to make our simulation loop independent of the exact problem we're trying to solve. It's now only really talking about the state in the general sense, and passes off to the update_state function for the details.

Since it's now generic, we can move it into its own function and rename things to be even more general.

We can take:

value_history = np.zeros(NUMBER_OF_STEPS)
value_history[0] = INITIAL_VALUE

for t in range(1, NUMBER_OF_STEPS):
    previous_v = value_history[t-1]
    new_v = update_state(previous_v)
    value_history[t] = new_v

and move it into two functions: initialise_state and run_simulation (the latter calls the former):

def initialise_state(NUMBER_OF_STEPS, INITIAL_VALUE):
    value_history = np.zeros(NUMBER_OF_STEPS)
    value_history[0] = INITIAL_VALUE
    return value_history

def run_simulation(NUMBER_OF_STEPS, INITIAL_VALUE):
    value_history = initialise_state(NUMBER_OF_STEPS, INITIAL_VALUE)

    for t in range(1, NUMBER_OF_STEPS):
        previous_v = value_history[t-1]
        new_v = update_state(previous_v)
        value_history[t] = new_v

    return value_history

rename some of the variables and arguments to make it read better:

def initialise_state(num_steps, initial_value):
    state = np.zeros(num_steps)
    state[0] = initial_value
    return state

def run_simulation(num_steps, initial_value):
    state = initialise_state(num_steps, initial_value)

    for t in range(1, num_steps):
        previous_v = state[t-1]
        new_v = update_state(previous_v)
        state[t] = new_v

    return value_history

and even simplify the simulation line to match our original recurrence relation ($\mathrm{state}_t = rules(\mathrm{state}_{t-1})$) very closely:

def run_simulation(num_steps, initial_value):
    state = initialise_state(num_steps, initial_value)

    for t in range(1, num_steps):
        state[t] = update_state(state[t-1])

    return state

Applying this change gets us to:

decay.py
import numpy as np

def initialise_state(num_steps, initial_value):
    state = np.zeros(num_steps)
    state[0] = initial_value
    return state

def update_state(old_state):
    DECAY_RATE = 0.9
    new_state = old_state * DECAY_RATE
    return new_state

def run_simulation(num_steps, initial_value):
    state = initialise_state(num_steps, initial_value)
    
    for t in range(1, num_steps):
        state[t] = update_state(state[t-1])
    
    return state

INITIAL_VALUE = 100
NUMBER_OF_STEPS = 6

value_history = run_simulation(NUMBER_OF_STEPS, INITIAL_VALUE)

print(value_history)
$
python decay.py
[100.     90.     81.     72.9    65.61   59.049]

Saving and loading data

It's possible to save a numpy array to a file using the np.savez_compressed function:

np.savez_compressed("decay", state=value_history)

where "decay" is the name of the file to save (it will save as decay.npz in this case), state is the name we're choosing for this variable in the file, and value_history is the data to put into that variable. This means that we can update our script to save the output data:

decay.py
import numpy as np

def initialise_state(num_steps, initial_value):
    state = np.zeros(num_steps)
    state[0] = initial_value
    return state

def update_state(old_state):
    DECAY_RATE = 0.9
    new_state = old_state * DECAY_RATE
    return new_state

def run_simulation(num_steps, initial_value):
    state = initialise_state(num_steps, initial_value)
    
    for t in range(1, num_steps):
        state[t] = update_state(state[t-1])
    
    return state

INITIAL_VALUE = 100
NUMBER_OF_STEPS = 20  # Increase this number to see the decay effect properly

value_history = run_simulation(NUMBER_OF_STEPS, INITIAL_VALUE)

print(value_history)

np.savez_compressed("decay", state=value_history)  # Add this line
$
python decay.py
[100.          90.          81.          72.9         65.61
  59.049       53.1441      47.82969     43.046721    38.7420489
  34.86784401  31.38105961  28.24295365  25.41865828  22.87679245
  20.58911321  18.53020189  16.6771817   15.00946353  13.50851718]

We can then go ahead and use the np.load function to load in our data in another Python script or Jupyter Notebook in order to visualise the simulation, e.g. with seaborn:

In [7]:
import seaborn as sns

sns.set_theme()

with np.load("decay.npz") as f:
    value_history = f["state"]

g = sns.relplot(data=value_history, kind="line").set(
    xlim=(0,None),
    ylim=(0,None),
    xlabel="Time step",
    ylabel="v"
)

You'll see here that we have created an exponential decay. However, nowhere in our code did we ever call the math.exp() function or explicitly use any exponentials. The shape of the curve simply emerged from the rules.

Exercise 1

  1. Edit the file decay.py to make the simulation run for 100 time steps.
  2. In either another Python script file or in a Notebook, draw the graph for this data and check that it reaches very close to zero.
  3. Save a copy of the graph as an image file. You can save a graph by calling the savefig method on the return value from sns.relplot:
g.savefig("decay.png")

answer

More complex states

Our exponential decay system has only one scalar value in its state at each time step. More complex systems may have multiple scalars, vectors or even larger states.

For example, a simulation of a pendulum swinging would have two variables in its state, the angle and the velocity of the pendulum.

With multiple variables, the update_state function needs to accept both of them as arguments (angle is the angle $\theta$ in radians, v is the angular velocity in radians per second), do the calculation, and return both:

In [8]:
def update_state(angle, v):
    gravity = -9.81  # m/s^2
    time_passed = 0.001  # seconds
    new_v = v + (angle * gravity * time_passed)
    new_angle = angle + time_passed*(v + new_v)/2
    return new_angle, new_v

Note here that the exact details of the calculation in this function are not important. What matters is that it's a function which takes the old state (angle and v) and then returns the new state (new_angle and new_v).

Then we must update the initialise_state and run_simulation functions to define both, pass both to update_state, assign both back and return both:

In [9]:
def initialise_state(num_steps, initial_angle):
    theta = np.zeros(num_steps)
    theta[0] = initial_angle
    velocity = np.zeros(num_steps)
    velocity[0] = 0
    return theta, velocity

def run_simulation(num_steps, initial_angle):
    theta, velocity = initialise_state(num_steps, initial_angle)
    
    for t in range(1, num_steps):
        theta[t], velocity[t] = update_state(theta[t-1], velocity[t-1])

    return theta, velocity

Finally,we can run our simulation and be provided with the overall values for both state values:

In [10]:
NUMBER_OF_STEPS = 5000
INITIAL_ANGLE = np.radians(5)  # 5 degrees in radians

theta, velocity = run_simulation(NUMBER_OF_STEPS, INITIAL_ANGLE)

g = sns.relplot(data=theta, kind="line").set(
    xlim=(0, None),
    xlabel="Time step",
    ylabel=r"$\theta$"
);

Again here, we get a sinusoidal output without ever having written any explicit $sin$ or $cos$ functions. The rules of the system allow the result to emerge naturally.