Loan-prediction Practice Problem : How to improve the model?

r
hackathon

#1

I have been trying to improve my model for the past 2 days, but can’t really find a way. I have tried logistic regression, classification trees and random forest with the optimal cp as well as threshold values. I have also imputed the missing data. I have tried fitting in more independent variables along with Credit_History, and forming new variables. After trying everything, I still cannot generate accuracy above 0.77777778. I need suggestions from people with better rank and others as well as to what else one can try? And also to add that, by using any of the 3 approaches I’ve mentioned, I am getting the same accuracy, which is also true for my baseline model. Why is that so?


Leaderboard scoring - clarification needed
#2

@Siddhant - I would suggest you to more focus on feature engineering part of the problem if you need any help you can refer to this article
http://www.analyticsvidhya.com/blog/2015/03/feature-engineering-variable-transformation-creation/.

My approach was that I have taken two variable (Credit_History and Property_Area ) and use cart model and try to tune the minbucket parameter.My model accuracy was 0.79291.

Hope this helps!

Regards,
Hinduja


#3

Even I’ve tried the same, but did not get much success. I took these two variables and used cart model and an optimal cp too. I doubt they have changed the test set a bit, because in the AV LearUp event that I attented the other day, I got an accuracy of 0.803 by forming the cart model with applicants income and credit history as independent variable.


#4

@Siddhant i had also got the same accuracy but it was based on the training set only, finally when it is tested on the test set, the accuracy reduces to 0.777777777778, so i guess we are overfitting a bit…even i am also stuck at the same score…


#5

@rahulone the accuracy on the training set was around 0.81, that was different. I am talking about the accuracy on test set in the AV LearnUp event, I got 0.803 on test set there. Don’t know what problem I am having now. I think the test set is a bit changed. Overfitting is surely not a problem, because I used the optimal cp value which does not let that happen. @kunal can you please tell if the test set in the online hackathon is different from the test set of the event on 20/12/2015.


#6

Thanks for the link, it was quite helpful. One more thing,can you test your same model in the online hackathon as well and tell me if your accuracy is the same as before?


#7

@hinduja1234 It is quite interesting that you used only two features to get accuracy around 0.79291 :slight_smile:, can you please tell me, how you exactly you chose these features or is there any algorithm you used for selection of features.:innocent:

Thanks in advance.


#8

you can use barota or random forests to find out the most significant variables


#9

could be number of variables are not enough and feature selection is definitely needed here


#10

Hey can you tell me what is this cart model and minibucket i cant understand


#11

What is cart model how to implement it?


#12

hey,what is cart model?where do i find it?thanks.


#13

Hi @kariss @B11

By cart model, we mean Decision tree. Refer these articles below for further read.


#14

Hi…
I am a beginner working on this problem. I have tried feature engineering,feature selection and ensembling. But my test accuracy is still the same 0.784722 value. I am not sure what is the problem. I kindly request others opinion on how to improve the model to achieve bettter accuracy.
Thank you