Iris setosa, Iris versicolor and Iris virginica
Firstly we need to load our data into our program. In a real-world case, this could come from any source, a CSV file, a SQL database, an Excel file etc. In our case today, we'll be using an example data set from scikit-learn:
from sklearn.datasets import load_iris
X, y = load_iris(as_frame=True, return_X_y=True)
Once we have the data loaded in, we need to split it to keep half aside for validation:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
TensorFlow can work with numpy array and Pandas DataFrames directly, but it also provides its own container for data called a Dataset
. This type can hold the data as well as help prepare it to be used by the system. You can turn a DataFrame
into a Dataset
using the from_tensor_slices
function:
import tensorflow as tf
train = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test = tf.data.Dataset.from_tensor_slices((X_test, y_test))
If you get a warning at this point about NUMA or GPU devices, like I did above, it's fine. It means that your model will not train as fast as if you have a GPU set up, but it will work for the session.
Once we have our data as a Dataset
, we can prepare it for use by TensorFlow. Depending on the problem you're solving, you might need to do different things to your data.
In our case today, we have quite a small input set of data and so likely the neural network will not learn enough by simply looking at each example in it once. To solve this, we will show each training example to the model multiple times. We can do this by using the repeat(20)
method on the Dataset
which will show each example to the network 20 times.
The downside of doing this is that the model will always see the same example in the same order each time round. This can cause it to not learn the general shape of the data as well so it's a good idea to randomise the order as well. This can be done with the shuffle()
method. This method also needs the number of items to grab each time it shuffles them. Setting it to 1000 means that it will grab the first 1000 examples, shuffle them and then grab the next 1000, shuffle them etc.
Finally, as an optimisation, we can show multiple examples to the network, all in one go. On larger data sets this can significantly increase the speed of training, and on any dataset will make the computed gradients smoother. A common choice is a batch size of 32, but you will sometimes be limited by available memory in some cases.
train = train.repeat(20).shuffle(1000).batch(32)
We need to ensure that our validation data set is the same dimensionality so we need to batch it too. It doesn't really matter what batch size you choose here, so a choice of 1
or 32
would work fine:
test = test.batch(1)
The model creation step is where most of the ingenuity in working with neural networks comes from. It is the choice of the number of neurons, and how they are interconnected that decide the capabilities of a network.
In principle you have completely free reign to connect together neurons however you which, as long as you have a place for the inputs to arrive, and a was of extracting an output. However, most problems can be solved with a feed-forward network where each layer only connects to the layer before it. This is managed by using the tf.keras.Sequential
model.
We need to tell the model the information about each layer. For our purposes here, we will define a relatively simple network with two hidden layers and one output layer. Each node in each hidden layer will be connected to all the nodes in the previous layer. This is called a densely-connected layer and is represented with tf.keras.layers.Dense
.
Each dense layer needs two pieces of information:
We'll make each hidden layer 10 neurons for now, and use the rectifier or ReLU) as our activation function. This is a good choice in many situations, and will suffice for now.
Our output layer needs a slightly different approach. For an output layer on a multi-class classifier, you will usually want a node per class. We have three species, so we have three nodes. In order to normalise the outputs of the network, we use the softmax
function which normalised the output of each output node so that they sum to 1.0. This allows you to treat each node's output as a probability.
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation=tf.nn.relu), # hidden layer
tf.keras.layers.Dense(10, activation=tf.nn.relu), # hidden layer
tf.keras.layers.Dense(3, activation=tf.nn.softmax) # output layer
])
At each step of the training we need to know "how wrong are we" as a single number. We need this so we know much the weights in the network should be shifted by.
If, for example, for a given run of the network, the output looks like [0.4, 0.3, 0.3]
(they add up to 1.0 due to the softmax) while the correct label is 0
we need a function to compare them and give us a single "loss" (e.g. 1.13
). This can be some using the sparse categorical crossentropy function. This is the loss function any time you have more than two classes (or at least in any case where your classes are encoded as integers 1
, 2
, etc.).
The metrics are there to show us information during training to monitor progress. The accuracy metric is a simple percentage of how any examples were predicted correctly.
model.compile(
loss="sparse_categorical_crossentropy",
metrics=["accuracy"],
)
We've now prepared everything we need for the model and we can go ahead and show it some data. There are two main types things you need to provide to it:
The former is easily answered as we've prepared all our data already so we pass it our train
and test
data sets.
--In order for the network to know how long to keep running we tell it directly. We do this by telling how many steps are in each epoch and how many epochs to run for. Traditionally, a data epoch was showing each training example to the network once. However, once we start batching, shuffling and repeating, it becomes a little less well defined. A step is a single batch of data being passed through the network so you would usually have number_of_examples
÷ batch_size
steps in each epoch.
There's no mathematical difference between one epoch and the next, the only meaning that it has is that TensorFlow will print a summary of the current loss and accuracy at the end of each epoch. For a network and data set this simple, --
model.fit(
train,
validation_data=test,
epochs=10,
)
The outputs here are:
The main one to be watching is the val_accuracy
as that is usually the metric that represents real-world performance.
The other thing to be aware of is that you want the accuracy
and the val_accuracy
to be matching each other. If accuracy
is much better than val_accuracy
, it suggests overfitting.
We can use the model.predict()
method. We must pass it data in the same shape as the data we trained with:
predict_X = [
[5.1, 3.3, 1.7, 0.5],
[5.9, 3.0, 4.2, 1.5],
[6.9, 3.1, 5.4, 2.1],
]
predictions = model.predict(predict_X)
To peek at this output, the prediction for the first flower is:
predictions[0]
These numbers represent the probability that this flower is of each of the species given. We can ask which class is the most likely with argmax()
:
predictions[0].argmax()
We can then automate this over all the examples:
for pred_dict, expected in zip(predictions, ["setosa", "versicolor", "virginica"]):
predicted_index = pred_dict.argmax()
predicted = load_iris().target_names[predicted_index]
probability = pred_dict.max()
tick_cross = "✓" if predicted == expected else "✗"
print(f"{tick_cross} Prediction is '{predicted}' ({100 * probability:.1f}%), expected '{expected}'")
Try altering the network in some of these ways:
See if it's possible to simplify the network, while still getting good performance.