How to resolve multi - class prediction error in xgboost in R

r
xgboost

#1

hello,

I have a multiclass prediction problem for which the classes are:

the table(labels) show the distribution after converting the labels to numeric.

I have used the below code for xgboost:

# xgboost parameters
library(xgboost)
param <- list("objective" = "multi:softmax",
              "num_class" = 13,
              "eval_metric" = "merror",    # evaluation metric 
              "nthread" = 8,   # number of threads to be used 
              "max_depth" = 16,    # maximum depth of tree 
              "eta" = 0.2,    # step size shrinkage 
              "gamma" = 0.01,    # minimum loss reduction 
              "subsample" = 1,    # part of data instances to grow tree 
              "colsample_bytree" = 1,  # subsample ratio of columns when constructing each tree 
              "min_child_weight" = 12)  # minimum sum of instance weight needed in a child 

#Convert labels to numeric:
num.class = length(levels(labels$country_destination))
levels(labels$country_destination) = 1:num.class

#Convert the data to matrix form:
#Convert the train_xg:
train.matrix = as.matrix(df_train)
mode(train.matrix) = "numeric"

#Convert the test_xg:
test.matrix = as.matrix(df_test)
mode(test.matrix) = "numeric"

#Convert the labels data:
labels.matrix <- as.matrix(labels$country_destination)
mode(labels.matrix) = "numeric"

# k-fold cross validation, with timing
nround.cv = 50
dtrain <- xgb.DMatrix(train.matrix, label=labels.matrix)
xgboost.cv <- xgb.cv(param=param, data=dtrain,nfold=10, nrounds=nround.cv, prediction=TRUE, verbose=0)

# index of maximum auc:
min.merr.idx = which.min(xgboost.cv$dt[, test.merror.mean]) 
min.merr.idx
## [1] 13
# minimum merror
xgboost.cv$dt[min.merr.idx,]

# real model fit training, with full data
xgb.bst <- xgboost(param=param, data=train.matrix, label=labels.matrix, 
                               nrounds=min.merr.idx, verbose=1)
pred <- predict(xgb.bst,test.matrix)

However:

> table(pred)
pred
    8    12 
47647 14449

I am not being able to understand why I am getting only two classes in pred.

Can someone please help me with this??


#2

Did you solve this? I have the same problem. :frowning:


#3

I never used XGBoost for multiclass classification, but the output should be a matrix of probabilities, where each column is the probability of the case being of a given class. This is what makes the table look weird.

If you want to look at a single table with the results, you have to threshold the probabilities and condense the output to a single column.


#4

as.numeric(target) should fix this. :slight_smile: