What is the default threshold when applying logistic regression in sklearn?



I am trying Logistic Regression on Titanic data set. The code that I used is :

from sklearn.linear_model import LogisticRegression clf=LogisticRegression(penalty='l2',C=1) clf.fit(train_x,label_x) pred=clf.predict(test_x)

I want to know the significance of Parameter ‘C’ in the code and also, In pred the output is either 1 or 0, here how did the logistic regression model chooses the threshold for classifying as 1 or 0? Is it .5 or model itself chooses the best result depending upon the AUC value?
Thanks in advance !
Syed Danish


Hi @syed.danish,

As per my understanding. The code is set to its default value of C = 1.0 and penalty = "l2". C is regularization strength which is used to regularize the logistic regression model to underfit or overfit depending upon the data(Read about concept of regularization , best resource Andrew Ng’s class). Ideally predict function uses 0.5 as a probability threshold to identify 1/0, please recheck the output of clf.predict_proba(test_x) and predict function classes for being certain on it.

Aayush Agrawal


@aayushmnit, Thanks for the explanation.