Understanding reliability diagram for classification

machine_learning
probability
predictive_model
isotonicregression
plattscaling

#1

Hi everybody,
I was going through a research paper(attached below if needed) on predicting good probabilities and it suggests two methods to do the same.

  1. Platt scaling
  2. Isotonic regression

It was mentioned there that on real problems where the true conditional probabilities are not known, model calibration can be visualized with reliability diagrams. First, the prediction space is discretized into ten bins. Cases with predicted value between 0 and 0.1 fall in the first bin, between 0.1 and 0.2 in the second bin, etc. For each bin, the mean predicted value is plotted against the true fraction of positive cases. If the model is well calibrated the points will fall near the diagonal line.

Ques : Please help me understand the highlighted line in the above paragraph.

Another thing that was mentioned here was Platt Scaling is most effective when the distortion in the predicted probabilities is sigmoid-shaped. Isotonic Regression is a more powerful calibration method that can correct any monotonic distortion.

Ques : Please help me understand the highlighted line in the above paragraph.

Thanks in advance,
Syed Danish

Predicting good probabilities with supervised learning.pdf (399.5 KB)


#2

@syed.danish

Hi there,

1- “If the model…”- This is to say that, if the posterior probabilities are the true probabilities of the representative classes then the mean of the predicted probability in a particular bin say 0.1 to 0.2 will be exactly equal to the observed relative proportion of 1 in that bin. It can be easily explained with an example- suppose there are 4 observations with true outputs as 1,1,1,1 . whereas the predicted probabilityin one case is 0.5,0.5,0.5,0.5. whereas in other it is 0.9,0.8,0.9,0.8. Which one do you expect to be the true posterior probabilty of the classes ? And as the posterior probabilities match the exact classes they form a 1:1 line (diagonal line).

2- What is assumed is that the predicted posterior probability is a representative form of log odds of the positive example and sigmoid seems to correct this distortion . To read the mathematical treatment on the topic, I suggest you to read this original paper by platt.

Regards,

Neeraj


#3

Thanks @NSS, that helped a lot !