How to Interpret roc_curve(Test,Predictions) in scikit-learn

python2
data_science
randomforest
scikit-learn

#1

I am working with scikit-learn for classification problem to predict Win or Loss of an opportunity.
I used the piece of code:

fpr, tpr, thresholds =roc_curve(yTest,predictions)

And the result is:

(array([ 0.       ,  0.2628946,  1.       ]),
 array([ 0.        ,  0.73692477,  1.        ]),
 array([2, 1, 0]))

I am aware of calculating the AUC using the fpr, tpr for various thresholds varying in the range (1,0). Ideally, what I know is thresold should be in between 1 and 0.

But, here the threshold values are 2,1,0. What to understand from this and how to interpret this.

The sample code looks fine:

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)

fpr
array([ 0. , 0.5, 0.5, 1. ])

tpr
array([ 0.5, 0.5, 1. , 1. ])

thresholds
array([ 0.8 , 0.4 , 0.35, 0.1 ])

My predict_proba(yTest) are, These are raw probabilities from Random Forest:

[ 0.09573287 0.90426713]
[ 0.14987409 0.85012591]
[ 0.16348188 0.83651812]
…,
[ 0.13957409 0.86042591]
[ 0.04478675 0.95521325]
[ 0.03492729 0.96507271]


#2

@Ashwanth_Daggula,
In the code :
fpr, tpr, thresholds =roc_curve(yTest,predictions)
Can you please mention the code that you used for the object “predictions”?

P.S. :
Make sure that you are using :
prediction = clf.predict_proba(X_test)[:, 1]


#3

What does this mean [:,1]. Probability of 1 or 0.


#4

Use clf.classes_ to known which probability belongs to what class.


#5

Thanks syed.danish. I got my answer.