Hello Everyone, I am here in this community and I am learning data science. So I have doubts about batch normalization. I have faced one interview and the interviewer asked me this question and that time I have no answer. As my research, it is a technique through which attempts could be made to improve the performance and stability of the neural network. I don’t think so its use only for this purpose. Can anyone know about batch normalization in Data Science? Please explain me.
To increase the stability of a neural network, batch normalization normalizes the output of a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation.
However, after this shift/scale of activation outputs by some randomly initialized parameters, the weights in the next layer are no longer optimal. SGD ( Stochastic gradient descent) undoes this normalization if it’s a way for it to minimize the loss function.
Consequently, batch normalization adds two trainable parameters to each layer, so the normalized output is multiplied by a “standard deviation” parameter (gamma) and add a “mean” parameter (beta). In other words, batch normalization lets SGD do the denormalization by changing only these two weights for each activation, instead of losing the stability of the network by changing all the weights.
It is explained in-depth here in this article: https://medium.com/@greatphilosopher98/batch-normalization-8fb27a96b9c7
Thanks @akshar89 you provides great reference. I have also checked this https://hackr.io/blog/data-science-interview-questions its also provides good information.