When in comes to artificial neural networks, training plays an integral part. One of the most common training methods is Backpropagation or Backward Propagation.This is a method for training artificial neural networks along with the use of gradient descent algorithm. In this article, I am going to provide a precise overview of Backpropagation with the help of some examples.

We came across the term 'Gradient Descent' in the earlier paragraph. It is an optimization algorithm of the first order of iteration. Let us try to find out the gradient descent for a given function f(x). If one takes steps negative to the point of local minima of the function f(x); one would get the gradient descent. Now, I first started with gradient descent because this term is used extensively while explaining the concept of back propagation.

Loss Function

Now, let's talk about a term called the “loss function” - It is defined as a function which maps a particular event to a set of real values. It also represents some ‘cost’ associated with the given event. The term cost refers to the extent that the resulting values differ from the expected ones. The lesser the loss function, higher is the optimization.

How the loss function is minimized

In Backpropagation, the weights of the neural network are constantly updated so as to minimize the loss function. The gradient of the loss function is calculated by substituting all the weights in the neural network with an optimized weight. This optimized weight is then fed to the network. Here, the entire effort remains in reducing the loss function.

Backpropagation in depth

Backpropagation is a supervised learning method as it requires a desired output for each of the input. The main motivation behind this is to obtain a desired output for each input. For example, you have the picture of a flower, you upload it and get the name of the flower instead. The name "rose" is provided for the uploaded image of a rose. This is a simple ‘classification’ task where the output should be at par with the input.

Let us take the model of the algorithm for explanation:

Let N be a neural network with e connections.

Let x1, x2…. be the inputs and y, y’, y1, y2 be the outputs of the network with weights as w, w0, w1....wn.We have to select an error function for calculating the difference between the two outputs. Generally the difference is given by E[y,y’] = (y-y’)^2, i.e the square of the Euclidean distance between two output vectors.

There are two phases of the Backpropagation Algorithm

1. Forward Propagation

Further, this phase has two steps:

a. Forward propagation of the input through the network to generate resultant output patterns.

b. Backward propagation of the output patterns through the neural network to generate the difference between the expected and the resultant output patterns. This difference is denoted by "delta".

2. Weight Update

This is the next phase. Here, the gradient of the weight found out by multiplying the delta with the input say x1 for each synapse.

A ratio is then subtracted from the total weight. This ratio has some influence on the speed and quality of learning. It is directly proportional to the speed at which the neural network can be trained’ but is inversely proportional to the accuracy of the training.

Both the processes are repeated again and again till the requirement of performance is met.

Well, this was about the usage of Backpropagation Algorithm in supervised neural networks. In my next article, I will discuss on the importance of this algorithm and its implications on Machine Learning.