Optimize `n_estimators` using `xgb.cv`



In this code fragment:

cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=1000, nfold=cv_folds,
        metrics='mlogloss', early_stopping_rounds=50)


How does this cvresults.shape[0] returns the optimal number of estimators (n_estimators).
I think num_boost_round denote the value of n_estimators used (increasing from 0 to 1000, early stopped by early_stopping_rounds), but I am not sure. What does num_boost_round represent here, and more importantly, how to get the optimal number of estimators using xgb.cv?


Hi @rahul485

num_boost_round corresponds to the number of boosting rounds or trees to build. Since trees are built sequentially, instead of fixing the number of rounds at the beginning, we can test our model at each step and see if adding a new tree/round improves performance.

If performance haven’t improved for N rounds (N is defined by the variable early_stopping_rounds), we stop the training and keep the best number of boosting rounds.

You can refer the below article:

It explains the parameter tuning in XGBoost with examples.


So what is the difference if I tune n_estimators from GripSearchCV?


Hi @rahul485,

xgb.cv is used for estimating the performance of one set of parameter (n_estimators) on unseen data. While GripSearchCV evaluates a model with varying parameters to find the best possible combination of these parameters.

In xgb.cv, we fix all the other parameters except n_estimators and calculate its optimum value while in GripSearchCV we give different combinations of parameters and find the optimum combination of parameters.


Yeah, I get that. But I am asking if I tune n_estimators 2 times:

  1. by using xgb.cv
  2. by using GridSearchCV only on n_estimators

Which one is preferred, and why?


Hi @rahul485,

To calculate the optimum value of n_estimators, xgb.cv will be preferred from the above two cases as the computation time of xgb.cv is very less as compared to GridSearchCV. Also the results from xgb.cv are more interpretable as it gives the cross validation score at each iteration.


@PulkitS Thanks for explanation. Really appreciate it.


Hi everyone,

Just to add to the this, XGBoost algorithm performs an internal cross validation and xgb.cv can be used to find the values at each stage. All the other algorithms do not have this property, hence we use GridSearchCV for them.

One question, if we use GridSearchCV on xgb, will it perform CV twice?


Could someone explain how alg.set_params(n_estimators=cvresult.shape[0]) gets us the best parameter for n_estimators. I read through all the discussion but couldn’t quite get it. Any help is highly appreciated.