How can I test if my logistic regression model is the best one?


  1. Generally when we build a model, we split the data in train and test set.
  2. Once I build a logistic regression model on train set by taking significant variables into consideration and check AIC criteria or confusion matrix.
  3. Then with the same model I check the accuracy of the model on test set

Suppose I build 3 different models by following the above 3 steps on my given dataset, how to know which one is the best one?



There aremutiple methods to deal this challenge. One is to look at statistical measures, how well
you can predict the dependent variable based on the independent variables and these measures are also known as measures of predictive power. Typically, they vary between 0 and 1, with 0 meaning no predictive power and 1 meaning perfect predictions. To analyse predictive power, you can look at R-square, the area under the ROC curve and other statistical metrics.

Other approach to evaluating model fit is to compute a goodness-of-fit statistic. You can look at deviance, the Pearson chi-square, or other tests to measure the goodness of fit. These tests helps to test null hypothesis and here p-value indicates a better fit.

Another approach to avoid overfit or underfit of the model, I would suggest you to divide the train data set in two parts (Train and Validation). Build your model on Train data set and test it first on Validate (You know the output) and check all statistical metrics.

Hope this help!