Gradient descent confusion



Hi friends,

When I do a gradient descent implementation, I see that it converges (what I think!) for a particular value of alpha and #of iterations.

However, keeping alpha same, if I increase iterations, I see small increase in cost function at high iterations. If I decrease alpha now, it again looks like convergence but when I increase #of iterations, cost function again seem to increase a bit.

Following is for alpha=0.07 and iterations=10000

Following is for alpha=0.07 and iterations=30000

Is this normal or something is wrong in my implementation? I am also keeping regularization penalty same in both cases.


I believe its a normal behavior. Cost function will try to reduce error on error surface. However, once certain level of convergence is reached, error can increase/ decrease by modifying alpha and/or #iterations. These hyperparamters only ‘attempts’ to go in the direction of reducing error. You can rely on them till delta error goes below certain threshold.