Machine Learning Article of the Day: Building a Simple Auto-Encoder
One of the simpler tasks that a neural network can accomplish is taking in an input, compressing said input, and rexpanding it into to a facsimile of its original form.
But why would you want to do that?
- Compression — Well duh, I just said it! Data/Image compression is, I think, the most obvious use case. You create a compressed version of the data, and then later on you can reconstruct the original data, albeit the result will be lossy, so you couldn’t use it for cases when you need super perfect reconstruction
- Super resolution — Or could you? Well still no, but a really neat technique called super resolution can be executed by auto-encoders. They take in a low quality image and produce a much higher resolution version of it. Great for if you want to build a live action Super Mario Bros movie? 🤷♂
- Image noise reduction — Because we can compress something, when we restore to the original, we can, with a sufficiently tuned network architecture, remove all the “unimportant” parts like noise, and keep only the main image.
- Generative AI — And ofcourse for all the Gen AI Heads out there, Variational Autoendcoders (VAEs) are advanced autoencoder architectures that can be used for creating completely new data sets — usually images, based on the original input (think Style GAN and novel face generation). But I’ll leave that well known, but still juicy bit for another discussion
And there are obviously more!
Today I’m going to focus on constructing a super simple auto-encoder, which takes as input images from the NIST Fashion data set, compresses them into the so called “latent space” of the auto encoder, and then expands them into a somewhat distorted but still recognizable version of the original image.
The neat thing about the approach I’m going to take is that it can be accomplished with a super simple fully connected MLP with only 1 hidden layer. Cool eh?
Well, it’s not the coolest, because you can use much more sophisticated libraries and built in structures to build AutoEncoders, but I feel that sometimes if you can demonstrate a functional principle in the simplest way possible, it builds a good foundation for future more complex learning.
Anyway, onward to the architecture —
AutoEncoder via a Simple MLP
So this mini article presumes you know how an MLP functions and what a neural network is and how it works. If you don’t, here is a great intro video to the concept.
Recall as I mentioned, an auto encoder functions by taking input data, compressing it, and then re-expanding it. The great thing about the MLP approach to all this is that you don’t need to know a ton about fancy image data compression techniques, in comparison to say for example if you were re-writing an RLE algorithm or JPEG compression from scratch.
We’ll start with the NIST Fashion data set, which has elements that are images of down-sampled, but clearly trendy apparel
In order to get this data, we can start with the following imports
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import fashion_mnist
We load the data, which comes along with Keras, and has conveniently been internally pre-split by the loader like so:
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
Next, just because it’s just a good thing to do, because it makes your neural network calculations more efficient, gets rid of nasty little exploding gradients, and a few other reasons, we reshape and normalize the input data between 0 and 1:
x_train = x_train.reshape((-1, 28 * 28)) / 255.0
x_test = x_test.reshape((-1, 28 * 28)) / 255.0
Now here comes the fun part. Or at least the first of the fun parts —
Again, the model we will be using is a simple MLP, with an input layers, 2 hidden dense (fully connected) layers, and the output. The first hidden layer is the compression layer, which created a compressed latent space representation of the data, and the second layer is the output layer, which returns the uncompressed guesstimated image to the output layer.
We use a ReLU for the 1st layer, because the neural network needs some non-linearity to learn complex features in the data, so ReLU is a very simple, convenient and popular activation function for that purpose, which incidentally, it’s also good at dealing with the vanishing/expoding gradient issue.
The second layer takes the learned representations from the first layer (logits) and then normalizes them into a range of 0 to 1, which is easier to use in the context of image classification — which this is a problem.
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(28 * 28,)))
model.add(Dense(784, activation='sigmoid'))
After we define our model, we “compile” it to build its computation graph, which we can then train with the good old fit function. The actual loss function and optimizer, are as simple as can be, and I’m only training it for 10 epochs.
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, x_train, epochs=10, batch_size=128)
And finally we predict the output, which should hopefully be just a distorted version of the input.
x_pred = model.predict(x_test)
We can visualize and compare our output image to the original image with the plotting code below:
# Choose an index of the test data to visualize the image and prediction
index_to_visualize = 0
plt.subplot(1, 2, 1)
plt.imshow(x_test[index_to_visualize].reshape(28, 28), cmap='gray')
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(x_pred[index_to_visualize].reshape(28, 28), cmap='gray')
plt.title('Predicted Image')
plt.axis('off')
plt.show()
And after 10 epochs of training, we get the below:
Yup! We reconstructed a completely new shoe from the original, but that was supposed to closely resemble the original. Here’s another example with index=3, which happens to be some stylish pants (in the grayscale ATARI Graphics world at least):
So really that was it!
I know, “Big deal!” you might be wondering.
But the good thing is even though this is a super simple model, much more complex stuff can be built from it!
Here is my full code! You can run it in a Jupyter Notebook or on the command line
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.datasets import fashion_mnist
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.reshape((-1, 28 * 28)) / 255.0
x_test = x_test.reshape((-1, 28 * 28)) / 255.0
model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(28 * 28,)))
model.add(Dense(784, activation='sigmoid'))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x_train, x_train, epochs=10, batch_size=128)
x_pred = model.predict(x_test)
index_to_visualize = 0
plt.subplot(1, 2, 1)
plt.imshow(x_test[index_to_visualize].reshape(28, 28), cmap='gray')
plt.title('Original Image')
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(x_pred[index_to_visualize].reshape(28, 28), cmap='gray')
plt.title('Predicted Image')
plt.axis('off')
plt.show()
So that’s it! Until next time — happy coding! Please also like and subscribe and all that jazz in order to support my work. Thank you again for reading!