What should be the value of nround in xgboost model



I am currently doing a classification problem using xgboost algorithm .There are four necessary attributes for model specification

data-Input data

label- target variable

nround-the number of trees to the model.

objective -for regression use ‘reg:linear’ and for binary classification use ‘binary:logistic’.

I want to know how to decided the value of nround so that our model does not over-fits


Hi @harry

Try making a cv(4-fold,7-fold) and evaluate the error matrix accordingly.Example code is given below -

params = {}
params["objective"] = "binary:logistic"
params["eta"] = 0.01
params["min_child_weight"] = 7
params["subsample"] = 0.7
params["colsample_bytree"] = 0.7
params["scale_pos_weight"] = 0.8
params["silent"] = 0
params["max_depth"] = 4
params["seed"] = 0
params["eval_metric"] = "auc"

plst = list(params.items())
xgtrain = xgb.DMatrix(x_train,label=y_train,missing=-999)
xgtest = xgb.DMatrix(x_test,missing=-999)
num_rounds = 3000
model = xgb.cv(params, xgtrain, num_rounds,nfold=4,metrics={'auc'}, seed = 0)

So it will give you your error value for each round number, you can decide by this your optimal number of rounds, where your test cv score is maximum.

Hope this helps.