Error when xgboost result applied on test set

xgboost

#1

After running xgboost on the train set, I apply it on the test set and get this error:
xgbpred1 <- predict (xgb1,dtest)
Error in predict.xgb.Booster(xgb1, dtest) :
Feature names stored in object and newdata are different!

below the xgboost code:
xgbcv <- xgb.cv( params = params, data = dtrain, nrounds = 100, nfold = 5, showsd = T,
stratified = T, print_every_n = 10, early_stopping_rounds = 20, maximize = F)

xgb1 <- xgb.train (params = params, data = dtrain, nrounds = xgbcv$best_iteration,
watchlist = list(val=dtest,train=dtrain), print_every_n = 10, early_stopping_rounds = 10,
maximize = F , eval_metric = “error”, eval_metric=“logloss”)

when I run xgbpred1 <- predict (xgb1,dtrain) it’ s ok and I can check that
the first obtained probabilities are ok ( xgbpred1[1:18]).
So, why this error quoted earlier when dtest is used?

dtrain
xgb.DMatrix dim: 21173 x 317 info: label colnames: yes
dtest
xgb.DMatrix dim: 21230 x 304 info: label colnames: yes

below, how dtrain & dtest are created:

dat_train <- dat[dat$raceid<=1500,] # Training dataset
dat_test <- dat[dat$raceid > 1500 & dat$raceid < 3001,]# Testing Dataset

train <- dat_train[,3:53] # removing the race_id and nochev
test <- dat_test[,3:53] # removing the race_id and nochev
setDT(train) # Changing the data.frame to data.table
setDT(test)

train[is.na(train)] <- “Missing”
test[is.na(test)] <- “Missing”

labels <- train$win
ts_label <- test$win

new_tr <- model.matrix(~.+0,data = train[,-c(“win”),with=FALSE])
new_ts <- model.matrix(~.+0,data = test[,-c(“win”),with=FALSE])

labels <- as.numeric(labels)
ts_label <- as.numeric(ts_label)
#f_label <- as.numeric(f_label)

Making XGboost Dense Matrix

dtrain <- xgb.DMatrix(data = new_tr,label = labels)
dtest <- xgb.DMatrix(data = new_ts,label=ts_label)


#2

@sandoz Could you also print and show the variables in the dataframes new_tr and new_ts?


#3

the culprit is this line of code :slight_smile:

rownames(dat_test) <- NULL # resetting the road names

later matrixed into dtest
(it is part of a piece of code written by a freelance , which escaped my view, and that I don’ t understand…).

now,
xgbpred1 <- predict (xgb1,dtest) works


#4

Hi Can you please share the entire code cause I am still getting the error and rownames(dat_test) <- NULL # resetting the road names is not present in my code :frowning: