MNIST Dataset Python Example Using CNN
It’s only a matter of time before self-driving cars become widespread. This tremendous feat of engineering wouldn’t be possible without convolutional neural networks. The algorithm used by convolutional neural networks is better suited for visual image processing than the one used in traditional artificial neural networks. Convolutional neural networks are composed of convolutional layers and pooling layers.
Convolutional Layer
Convolutional layers take advantage of the fact that all images can be encoded in terms of 1s and 0s to create feature maps. A feature detector is simply a matrix, whose values correspond to a feature of the image (i.e. pointy ears, slit eyes…). The matrix overlays a section the image and performs bit-wise multiplication with all of the values at that location. The results of the bit-wise multiplications are summed and put in the corresponding location of the feature map. It then shifts to another section of the image and repeats the process until it has traversed the entire image.
Pooling Layer
Pooling is a lot like convolution except we don’t make use of a feature detector. Instead we use max pooling. The process of max pooling consists in taking a highest value within the area of the feature map overlaid by the window (nxn matrix) and putting it in the corresponding location of the pooled feature map. Pooling is useful in that it reduces the size of the image making it easier to compute and detect patterns despite differences in spatial orientation.
For example, suppose the number 4 coincided with the slit eyes of a cat. Whether the eyes of the cat were looking directly at the camera or off to the side when the picture was taken, max pooling may still come up with the same value.
That’s enough background information, on to code. The proceeding example uses Keras, a high-level API to build and train models in TensorFlow.
import keras
from keras.datasets import fashion_mnist
from keras.layers import Dense, Activation, Flatten, Conv2D, MaxPooling2D
from keras.models import Sequential
from keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt
Run the following line of code to import our data set.
(train_X,train_Y), (test_X,test_Y) = fashion_mnist.load_data()
The Fashion MNIST data set contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:
Data Preprocessing
When using a convolutional layer as the first layer to our model, we need to reshape our data to (n_images
, x_shape
, y_shape
, channels
). All you really need to know is that you should set channels
to 1 for grayscale images and set channels
to 3 when you have a set of RGB-images as input.
train_X = train_X.reshape(-1, 28,28, 1)
test_X = test_X.reshape(-1, 28,28, 1)
Negative one is the same as specifying the total number of images in the training set.
train_X.shape
Out[00]: (60000, 28, 28, 1)
Modifying the values of each pixel such that they range from 0 to 1 will improve the rate at which our model learns.
train_X = train_X.astype('float32')
test_X = test_X.astype('float32')
train_X = train_X / 255
test_X = test_X / 255
Our model cannot work with categorical data directly. Ergo we must use one hot encoding. In one hot encoding, the digits 0 through 9 are represented as a set of nine zeros and a single one. The digit is determined by the location of the number 1. For example, you’d represent a 3 as [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
.
train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)
Training
Our convolutional layers will have 64 neurons (feature maps) and a 3x3 feature detector. In turn, our pooling layers will use max pooling with a 2x2 matrix. Convolutional neural networks are almost always proceeded by an artificial neural network. In Keras, a Dense
layer implements the operation output = activation(dot(input, weight) + bias)
. The input to our artificial neural network must be in one dimension therefore we flatten it beforehand.
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
Since the algorithm involved in convolutional neural networks use matrices, you can benefit immensely from running TensorFlow on your GPU. If you’re training the model using a CPU, I wouldn’t recommend more than 10 epochs as it can take a while. The batch size must match the number of images going into our first convolutional layer.
model.fit(train_X, train_Y_one_hot, batch_size=64, epochs=10
By the tenth epoch, we obtained an accuracy of 94% which is really good all things considered. Let’s see how well our model does at categorizing new images.
test_loss, test_acc = model.evaluate(test_X, test_Y_one_hot)
print('Test loss', test_loss)
print('Test accuracy', test_acc)
Out[]: Test loss 0.2947616615891457
Out[]: Test accuracy 0.9006
As you can see, based off the test accuracy, we’ve slightly overfitted our model.
Let’s take a look at the first prediction made by our model.
predictions = model.predict(test_X)
print(np.argmax(np.round(predictions[0])))
Out[30]: 9
The number 9 has a matching class ofAnkle boot
.
Running the following code will display the first image.
plt.imshow(test_X[0].reshape(28, 28), cmap = plt.cm.binary)
plt.show()
Our model correctly classified the first image in our testing data set.
import keras
from keras.datasets import fashion_mnist
from keras.layers import Dense, Activation, Flatten, Conv2D, MaxPooling2D
from keras.models import Sequential
from keras.utils import to_categorical
import numpy as np
import matplotlib.pyplot as plt
(train_X,train_Y), (test_X,test_Y) = fashion_mnist.load_data()
train_X = train_X.reshape(-1, 28,28, 1)
test_X = test_X.reshape(-1, 28,28, 1)
train_X = train_X.astype('float32')
test_X = test_X.astype('float32')
train_X = train_X / 255
test_X = test_X / 255
train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)
model = Sequential()
model.add(Conv2D(64, (3,3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(64, (3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_X, train_Y_one_hot, batch_size=64, epochs=5)
test_loss, test_acc = model.evaluate(test_X, test_Y_one_hot)
print('Test loss', test_loss)
print('Test accuracy', test_acc)
predictions = model.predict(test_X)
print(np.argmax(np.round(predictions[0])))
plt.imshow(test_X[0].reshape(28, 28), cmap = plt.cm.binary)
plt.show()