Convolutional N eural Networks By Roger Ballard Tanqiuhao Chen A Brief Introduction to Neural Networks Neural networks are a supervised learn ing classification algorithm Given sufficient training data, a sufficie ntly large neural network can approxim ate any function from Rn to Rm Neural networks require a ton of training da
ta Image credit: https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg Neural Network Topology A traditional neural network consists of multiple layers of fully-connected perce ptrons Data flows through the network from th e input layer, through the hidden layers, to the output layer Image credit: https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg
Anatomy of a Perceptron A perceptron has n inputs, and 1 output Calculates a weighted sum of the inputs is the vector of all weights associated with each of the inputs is the bias value (which allows the function to have nonzero outp ut when the input is zero) Applies the chosen activation function to the weighted sum There are multiple possible functions, but the logistic function
and the leaky rectified linear function are two common choices Image credit: https://www.hiit.fi/u/ahonkela/dippa/node41.html https://commons.wikimedia.org/wiki/File:Logistic-curve.svg Training a Neural Network Method called backpropagation Based on gradient descent Calculate the result of running input through the network Compare to known/desired output Calculate the amount of error using a cost function Quadratic cost / sum squared error: chosen because the derivative is easy to co mpute, and it is a simple cost function Kullback-Leibler divergence: chosen because it measures the error in terms of i
nformation entropy Training a Neural Network (continued) Move backwards through the network, calcu lating the partial derivative of the cost with r espect to each of the parameters Also known as the cost function gradient Update each parameter proportionally to its partial derivative, using the training constan t Apply repeatedly with multiple inputs until t he cost function converges to a local minimu
m Image credit: http://charlesfranzen.com/posts/multiple-regression-in-python-gradient-descent/ What Makes a Convolutional Network D ifferent Images have certain properties that lead to the solution provided by CN Ns Images are big; a fully-connected network for an image would have an enormous n umber of parameters and be extremely difficult to train If the ability to recognize a feature is useful in one part of an image, it is likely to be useful everywhere in the image Convolution neural networks introduce two new types of layers, on top
of the traditional fully connected layers Convolutional layers Pooling layers Convolutional Layers The same perceptrons are applied to multiple different regions of t he input Mathematically, this is a convolution, which gives CNNs their name Hyperparameters:
Receptive field Depth Stride Zero-padding mage credit: https://commons.wikimedia.org/wiki/File:Conv_layers.png https://commons.wikimedia.org/wiki/File:Conv_layer.png Pooling Layers
Reduces the size of a layers output Useful to start recognizing higher-level features Traditionally max-pooling, but other methods are possible Parameters: Filter size Stride Zero-padding (If necessary) Image credit: https://commons.wikimedia.org/wiki/File:Max_pooling.png Traditional Convolutional Network Topo logy
Convolutional layers to extract features Pooling layers between convolutional layers, to increase scale and extract higher-level features at eac h layer End with several fully-connected layers to perform classification based off of the extracted features Image credit: http://www.computervisionblog.com/2015/04/deep-learning-vs-probabilistic.html