My understanding of the situation - the regularisation parameter which gives the best training set results, is going to be the one which optimizes the results on the training set, which does not take into account test (or ideally dev) set performance. Regularisation should be used to control the bias variance tradeoff, and in this case be used to ensure that the variance is not too large.
Consider the following - Gamma of 0 gives you training accuracy 0.9. Test set get 0.7.
Gamma of 10 gives you training 0.8. Test of 0.8.
Hence you have not overfit with gamma of 10! In the above tutorial, I believe that evaluating gamma based upon training set will preferentially tell you to choose gamma 0 and get best training accuracy with no regards for the test set accuracy, and hence the ability of the model to generalise
I hope there is a good reason for it and it is my misunderstanding, however I am so used to finding tutorials that are rife with bad practice and hence models do not generalise well at all. It happens more often than it does not happen, and hence I am skeptical. But Perhaps there is a good reason why that I am just not seeing right now?