What should be the value of nround in xgboost model

xgboost

#1

I am currently doing a classification problem using xgboost algorithm .There are four necessary attributes for model specification

data-Input data

label- target variable

nround-the number of trees to the model.

objective -for regression use ‘reg:linear’ and for binary classification use ‘binary:logistic’.

I want to know how to decided the value of nround so that our model does not over-fits


#2

Hi @harry

Try making a cv(4-fold,7-fold) and evaluate the error matrix accordingly.Example code is given below -

params = {}
params["objective"] = "binary:logistic"
params["eta"] = 0.01
params["min_child_weight"] = 7
params["subsample"] = 0.7
params["colsample_bytree"] = 0.7
params["scale_pos_weight"] = 0.8
params["silent"] = 0
params["max_depth"] = 4
params["seed"] = 0
params["eval_metric"] = "auc"

plst = list(params.items())
xgtrain = xgb.DMatrix(x_train,label=y_train,missing=-999)
xgtest = xgb.DMatrix(x_test,missing=-999)
num_rounds = 3000
model = xgb.cv(params, xgtrain, num_rounds,nfold=4,metrics={'auc'}, seed = 0)

So it will give you your error value for each round number, you can decide by this your optimal number of rounds, where your test cv score is maximum.

Hope this helps.

Regards,
Aayush