How to predict class labels using xgboost in python when objective function is binary:logistic




I am using the below parameters:

parm = { 'objective':"binary:logistic",
            'seed':88888 }

and then I am using the below code:

from sklearn.cross_validation import KFold
import xgboost as xgb
from sklearn import  metrics
a = np.array([])
kfolds = KFold(train_X.shape[0], n_folds=6)
for dev_index, val_index in kfolds:
    dev_X, val_X = train_X[dev_index,:], train_X[val_index,:]
    dev_y, val_y = train_y[dev_index], train_y[val_index]
    dtrain = xgb.DMatrix(dev_X,label = dev_y)
    dtest = xgb.DMatrix(val_X)
    bst = xgb.train( plst,dtrain, num_rounds)
    ypred_bst = bst.predict(dtest,ntree_limit=bst.best_iteration)
   ** score = metrics.confusion_matrix(val_y, ypred_bst)**
    res = (score[0][0]+score[1][1])*1.0/sum(sum(score))
    a = np.append(a,[res])
    print "Accuracy = %.7f" % (res)
print "Overall Mean Accuracy = %.7f" % (np.mean(a))

however this is giving an error for the score part as:

I think this is because logistic will return probabilities and not class labels i think.So how can i specify that class labels are returned.?
Can someone please help me with this?


Hi @pagal_guy,

Binary Logistic will only return probabilities in Xgboost. You can convert these probabilities in 1/0 by taking anything above 0.5 as 1 and rest a 0. Here is the code for it :-1:

import numpy as np
ypred_bst = np.array(bst.predict(dtest,ntree_limit=bst.best_iteration))`
ypred_bst  = ypred_bst > 0.5  
ypred_bst = ypred_bst.astype(int)  

You can also change probability from 0.5 cutoff to something more or less to check if it gives a boost to your accuracy.

Hope this helps.