When I do a gradient descent implementation, I see that it converges (what I think!) for a particular value of alpha and #of iterations.
However, keeping alpha same, if I increase iterations, I see small increase in cost function at high iterations. If I decrease alpha now, it again looks like convergence but when I increase #of iterations, cost function again seem to increase a bit.
Following is for alpha=0.07 and iterations=10000
Following is for alpha=0.07 and iterations=30000
Is this normal or something is wrong in my implementation? I am also keeping regularization penalty same in both cases.