The CIFAR-10 dataset consists of 60,000 labeled images in 10 mutually exclusive classes, with a fixed size of 32x32x3 color channels [Krizhevsky and Hinton 2009]. The CIFAR-10 dataset is challenging because the images are low-resolution and contain variations in pose, viewpoint, lighting, and background.
Our datasets are tensorflow datasets and we configure them in such a way that they make training the models computationally efficient. We employed buffered prefetching, which enables loading data from disk efficiently by overlapping data preprocessing and model execution while training. We also cache the images in memory after they’re
The RGB channel values are in the [0, 255] range, and this is not ideal for a neural network. We opt to make our input values smaller. We will thus normalize the values to be in the[0, 1] range.
This algorithm differs from various adaptive algorithms by only tracking momentum and leveraging the sign operation to calculate updates, leading to lower memory overhead and uniform update magnitudes across all dimensions. It has demonstrated that it can aid tasks such as image classification to attain outstanding results.
Dropout introduces noise and randomness into the network during training, which acts as a form of regularization. It prevents the co-adaptation of neurons, encourages the learning of more robust features, and reduces the network’s sensitivity to specific patterns in the training data. This can lead to better generalization performance on unseen data.
Weight decay, also known as L2 regularization or ridge regularization, is a regularization technique commonly used to prevent overfitting and improve the generalization performance of a model. Even though it is specifically designed for linear models, we will look to apply it to the trainable parameters of the model to investigate it’s effect.
Traditional neural networks (TNNs) are universal approximators, which means that they can theoretically learn any function. However, they are not very efficient at learning functions that are spatially local, such as the features of an image. For our project, we will be looking at the task of image classification using the CIFAR-10 dataset.
Convolutional neural networks (CNNs) have become quite popular in image classification tasks more than the traditional fully connected neural networks because of their better performance with regard to processing data with a known grid-like structure such as an image.
In general, the more layers we have in a network, we increase its knowledge capacity and its ability to understand the task at hand. More specifically, in CNNs, the more convolutional layers we have, the more high-level features we extract to better classify the image.
As expected, we found that CNNs outperform TNNs on this image classification task. We also found that adding more layers and filters to our networks improved their performance and this was especially true for the CNNs.
Additionally, we found that we can jointly leverage dropout and L2 regularization to minimize the discrepancy between training and validation loss and accuracy, effectively reducing overfitting and improving the accuracy of our models. Another finding we have made is that the choice of optimizer also plays a role in stabilizing model training and also improving model accuracy.