How to resolve error while predicting using xgboost in R

r
xgboost

#1

Hello,

I am trying to implement xgboost in R for a classification problem:

# xgboost parameters
param <- list("objective" = "binary:logistic",    # binary classification 
              "eval_metric" = "error",    # evaluation metric 
              "nthread" = 8,   # number of threads to be used 
              "max_depth" = 16,    # maximum depth of tree 
              "eta" = 0.2,    # step size shrinkage 
              "gamma" = 0,    # minimum loss reduction 
              "subsample" = 1,    # part of data instances to grow tree 
              "colsample_bytree" = 1,  # subsample ratio of columns when constructing each tree 
              "min_child_weight" = 12)  # minimum sum of instance weight needed in a child 
 # Split back into test and train sets
train_xg <- combi_fg[1:891,]
test_xg <- combi_fg[892:1309,]

#Convert the data to matrix form:
#Convert the train_xg:
train.matrix = as.matrix(train_xg)
mode(train.matrix) = "numeric"
#Convert the test_xg:s
test.matrix = as.matrix(test_xg)
mode(test.matrix) = "numeric"

# k-fold cross validation, with timing
nround.cv = 200
xgboost.cv <- xgb.cv(param=param, data=train.matrix, label=train$Survived, 
                      nfold=10, nrounds=nround.cv, prediction=TRUE, verbose=T)

However when I am trying to predict on the test data using:

pred <- predict(xgboost.cv,test.matrix)

I am getting an error:

Why is this error coming and how to resolve it??


#2

hello ,

xgb.cv will only contain the error rates per iteration and hence cannot be used for prediction.

# k-fold cross validation, with timing
nround.cv = 5000
dtrain <- xgb.DMatrix(train.matrix, label=train$Survived)
xgboost.cv <- xgb.cv(param=param, data=,dtrain,nfold=10, nrounds=nround.cv, prediction=TRUE, verbose=0)

# index of maximum auc:
max.auc.idx = which.max(xgboost.cv$dt[, test.auc.mean]) 
max.auc.idx 
## [1] 493
# minimum merror
xgboost.cv$dt[max.auc.idx,]

# real model fit training, with full data
xgb.bst <- xgboost(param=param, data=train.matrix, label=train$Survived, 
                            nrounds=max.auc.idx, verbose=1)
pred <- predict(xgb.bst,test.matrix)

Hope this helps!!


#3

I am still getting below error:

pred <- predict(xgb.bst,test.matrix)

Error in xgb.DMatrix(newdata) :
There are NAN in the matrix, however, you did not set missing=NAN

What should be the format of Test.matrix … only independent variable or “independent variables with null dependent variable”?


#4

Hi @sandeepak

Did you check and impute missing values in the test data? Looks like, xgboost is failing to traverse over them.
Since you have provided no context of the problem you are working, it’s hard to answer otherwise.

Note: Questions asked in comments, don’t get answered generally. You should instead ask your question in a new thread with proper context, so that it becomes easier to community users to answer your question quickly.

Regards
Manish