I was going through the basics of neural network and I am having difficulties understanding how the back-propagation algorithm works. Can someone help me to understand the intuition behind using the partial derivatives in the gradient descent optimization?
The back-propagation algorithm is used to correct/modify the weights such that the error between the actual output (from the training data) and the output produced by the neural network is minimized. It is fairly straightforward to compute the error of the last (output) neuron, since we know from our training data what the actual value should have been. But, for the neurons in the hidden layer, we do not know what the actual values should have been. So, for finding out the error for these neurons, we need some sort of mechanism (like credit assignment) to apportion/allocate the error from the output neuron to the penultimate layer and so forth. This mechanism is facilitated by the use of partial derivatives as weights in the credit assignment process. This is the basic intuition. For a more rigorous understanding, you could see the lecture videos on the Coursera course by Geoff Hinton. Alternatively, you could see this video, which also explains the details well.