What is the difference between gradient boosting and stochastic gradient boosting




I am trying to understand the various ways in which boosting can be performed in R.I have tried stochastic gradient boosting and gradient boosting:

#Stochastic Gradient Boosting:
modFit <- train(wage ~ ., method="gbm",data=training,verbose=FALSE)
predictions <- predict(modFit,testing)
#Calculate RMSE:
rmse.stochastic.gbm <- RMSE(predictions,testing$wage)

This gives an output:

And when I use only gradient boosting:

#Gradient Boosting:
boost.train.class=gbm(wage~., data=training, n.trees=500,interaction.depth=4)
boost.pred=predict(boost.train.class,newdata=testing, n.trees=500)
#Calculate RMSE:
rmse.boost <- RMSE(boost.pred,testing$wage)

This are the outputs from these:

For the RMSE of the two models:

I understand that in Stochastic bootstrapped samples are taken whereas in gradient boosting the total sample space is taken,right??
In the second case the RMSE is lower because since we are taking bootstrapped samples the model generalizes well??

So when should we use which model of boosting and why??
Can someone please help me understand this.?


Hi Pagal,

well in both case you use stochastic gradient boosting !!! once you call via caret for what I noticed and then directly… you call the same method differently that is all.
GBM uses stochastic gradient boosting ( see Generalized Boosted Models:
A guide to the gbm package) … so the difference is in the direct call you set interaction.depth=4 with 500 tree and in caret it is 3 with 150 trees as mentioned in the snapshot. Caret does some test to find the best setting.

Check the caret documentation there are few