I did not understand credit_history variable. Can any one help me with the explanation please?



Explanation needed for credit_history varible please.


A credit history is a record of a borrower’s responsible repayment of debts. A credit report is a record of the borrower’s credit history from a number of sources, including banks, credit card companies, collection agencies, and governments.
In short it is the credibility of the applicant extracted from various sources.
Syed Danish


sir, thank you it was helpful but how to handel the missing values in this varible


There are various methods to do that :

1.Drop these entire rows
2.Impute with any constant
3.Impute with mode
4.Run a model to predict the missing values
5.Create a new variable, as flag to indicate missing or non-missing


sir, i have created the model on the training data set, now i got the probabilities, if i want to predict the loan status in the test data set i need to give certain limt to the probabilities right , what limit should i give. i mean should i say people who got more than 50% are elgible for loan and rest are not elegible is this the way or i am thinking in the wrong way


Hi @pridhvi,

For setting up a threshold value , you must keep the business side of the problem in mind. For a bank, it is highly important not to give a loan to a person who does not deserve it. It may not be a big problem to them if they deny a loan to a deserving candidate. Thus from a bank’s perspective, false positives need to be avoided preferably.

Hence, they must keep the threshold above 0.5 for sure. This will ensure that a person even with a probability of 0.5 does not receive the loan and this makes it safer for the bank.

Now to determine the exact number of the threshold value, you can either make a calculated guess, or use the ROC curves to do the same.



Can someone explain me if more the Credit History better is one’s Credit Score? or Viceversa?


I understand the credit_history variable to be whether a person’s credit history meets ‘guidelines’. I imagine for Dream Housing Finance, it has its own criteria for what a ‘good’ credit history is, and the 0 and 1 is binary. Meaning, 0 = false (not good credit history) and 1 = true (good credit history).

Hope that helps.


Hi Danish,

credit history is an integer. How can we impute mode for this variable? Mode can be used for categorical variables. Please suggest whether we should use Mean/Median for this credit history variable?



According to me, a person can have a credit history either good (1 in this case) or bad (0 in this case). However, it is also possible that the person don’t have a history, because she never took any loan previously. Therefore, I am treating the variable as categorical with values 1, 0 and 2 (which are null, thus implying that they don’t have history.)


I want to treat NA values of Credit history variable as new factor level.Can somebody help me with code?


@BhaskarBiswas, I don’t think that is the case here. People who don’t have credit history can’t get a loan. So if the got a loan, I think we should assume they have a credit history. We should try to predict whther it was good or bad.


Agreed, I also feel that there should be three category and there are many who have taken the first loan (that is no credit history)