Comparing Models Performance



Initially I built a model for Loan Prediction like below:

Area Under the Curve: 0.8081062

Significant at 0.05% Level: Married, Property_Area and Credit History

So I modified the model by keeping only the significant predictors


Area Under the Curve: 0.7969799

So is it right to say that model1 is better as AUC is comparatively more or Model2 is better as I am predicting with fewer variables which is giving AUC slightly lower than Model1.


Whether a model is better than another will depend on how you define “better” and what is the cost of making a Type-I and Type-II error.

Let’s say that “Time” is the most critical factor in your case. So a model which executes faster than another model even if there is a slight fall in the model’s performance might still be better.
May be “Computing Power” requirement is your criteria or anything else.

Just like in competitions, the criteria is often just the “SCORE”.

Another important factor is what is the cost of making mistakes. Say you are building a model which will predict whether a patient will die or not because of a disease. In such cases, whenever you model goes wrong, a person is dying which is a big COST.

So the context of the problem defines what is “better”.

Hope this helps!


Usually the competition specifies the criteria for judging the model. This particular competition hasn’t but I am assuming accuracy.


@sURYA ,

i TRIED BOTH OF YOUR models and I did not get any improvement on my early performance of .791 percent with a cforest - based model. Can you please clariify your methods here ?


Use logistic regression and you will see the difference in the above 2 models


@Surya1987 : Which method did you used for imputation of missing values…??


Anyone tried something different to improve the model performance(>=79%) specially in feature engineering?


I am new to DataScience domain
Please suggest me which ALGORITHM will be best to solve this problem.


I think we can’t apply regression for this data set