Threshold for logistic regression



How do we select an optimum threshold value for a logistic regression model in python?



There are multiple ways to select the threshold value and it depends on the objective of the regression model. I will enumerate the different methods along with the scenarios in which they are implemented.

Scenario - When you don’t want your predictions to be biased. i.e you want both true positive rate and true negative rate to be high.

Solution - In this case, you look for a threshold value which maximises both sensitivity and specificity. You can plot both of them against the threshold value and choose the intersection point as the threshold value.

This image should help you get the concept clear. The drawback of this method is that, the threshold value choosen will not yield you the highest prediction

Scenario: When you want the accuracy of your model to be high.

Solution: In this case, you choose the cross validation method to determine the best threshold value for higher precision.

Hope this helps.


Adding to that wonderful answer the solution is basically the cost function of what you can accept as a threshold value. There is no right threshold value as the business may be ok with type 1 error while they want minimal type 2 error or vice versa.