# Creating confusion mattrix on Loan Prediction dataset

#1

I want to make confusion matrix out of loan prediction data set, will somebody out there help me. I am referring to this topic

#2

Import confusion matrix from sklearn. Then pass the actual values and predictions as arguments to the
confusion_matrix function. Then print the confusion matrix.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(actual_values, predicted_values)
cm

#3
``````from sklearn.metrics import confusion_matrix
cm=confusion_matrix(data[outcome] , predictions)
print(cm)``````

#4

#5

Hi @aaron11,

The code provided by @A.Malathi is exactly what you need! In your code, you are using `Credit History` instead of the actual values. The confusion matrix is a table that describes the performance of a classification model. You would have to use the predicted values and the true values to print the confusion matrix.

#6

Aishwarya, Will you please help me out in printing the confusion matrix. Iâm not getting the result I have tried all possible way

#7

#8

Hi @aaron11,

``````df = pd.read_csv('train.csv)
``````
2. Impute the missing values

3. Create dummies

4. Split the dataset into train and test using the below code

``````from sklearn.model_selection import train_test_split
train, test = train_test_split(df, test_size=0.3, random_state=0)

x_train=train.drop('Loan_Status',axis=1)
y_train=train['Loan_Status']

x_test=test.drop('Loan_Status',axis=1)
y_test=test['Loan_Status']
``````
5. Fit a model

``````from sklearn import tree
model = tree.DecisionTreeClassifier(random_state=1)
model.fit(x_train, y_train)
``````
6. Predict the values on test set

``````pred = model.predict(x_test)
``````
7. Plot the confusion matrix

`````` from sklearn.metrics import confusion_matrix

cm=confusion_matrix(y_test ,pred)
print(cm)
``````

My output looks like

[[ 26 25]
[ 26 108]]

8. For better visualisation, go for this code

``````cm = pd.crosstab(y_test, pred, rownames=['Actual'], colnames=['Predicted'], margins=True)
cm
``````

Result

Predicted 0 1 All
Actual
0 27 24 51
1 21 113 134
All 48 137 185

Hope this helps!

#9

#10

Iâm sorry but Iâm new to all this and learning for the very first time thatâs why just need spoon feed assistance. Youâve helped me alot please @AishwaryaSingh Please look into this

#11

Looks like you have not performed one-hot encoding on your dataset. Decision tree model cannot deal with the categorical variables.

please complete this step before you fit the model. Use `pd.get_dummies` .

#12

Hi @aaron11,
https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/

data.isnull().sum()

1. Fill in the missing values

data[âGenderâ].fillna(âMaleâ,inplace=True)
data[âMarriedâ].fillna(âYesâ,inplace=True)
data[âDependentsâ].fillna(â0â,inplace=True)
data[âSelf_Employedâ].fillna(âNoâ,inplace=True)
data[âProperty_Areaâ].fillna(âSemiurbanâ,inplace=True)
data[âLoan_Amount_Termâ].fillna(360,inplace=True)

data[âLoanAmountâ].fillna(data[âLoanAmountâ].mean(), inplace=True)

conditions = [data[âLoan_Statusâ] == âYâ, data[âLoan_Statusâ] == âNâ]
values = [1.0, 0.0]
data[âCredit_Historyâ] = np.where(data[âCredit_Historyâ].isnull(),
np.select(conditions, values),
data[âCredit_Historyâ])

There should not be any missing values now.

data.isnull().sum()

1. Create new variables to nullify the effect of outliers.

data[âLoanAmount_logâ] = np.log(data[âLoanAmountâ])
data[âTotalIncomeâ] = data[âApplicantIncomeâ] + data[âCoapplicantIncomeâ]
data[âTotalIncome_logâ] = np.log(data[âTotalIncomeâ])

1. sklearn requires all inputs to be numeric, we should convert all our categorical variables into numeric by encoding the categories.

var_mod=[âGenderâ,âMarriedâ,âEducationâ,âSelf_Employedâ,âProperty_Areaâ,âLoan_Statusâ]
for i in var_mod:
data[i] = data[i].astype(âcategoryâ)
for i in var_mod:
data[i] =data[i].cat.codes
After for statements give a tab space in the next line.

1. Import models from scikit learn module:(Code as given in the tutorial + confusion matrix)

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import KFold #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
from sklearn.metrics import confusion_matrix

#Generic function for making a classification model and accessing performance:
def classification_model(model, data, predictors, outcome):
#Fit the model:
model.fit(data[predictors],data[outcome])

#Make predictions on training set:
predictions = model.predict(data[predictors])

#Print accuracy
accuracy = metrics.accuracy_score(predictions,data[outcome])
print(âAccuracy : %sâ % â{0:.3%}â.format(accuracy))

#Perform k-fold cross-validation with 5 folds
kf = KFold(data.shape[0], n_folds=5)
error = []
for train, test in kf:
# Filter training data
train_predictors = (data[predictors].iloc[train,:])

``````# The target we're using to train the algorithm.
train_target = data[outcome].iloc[train]

# Training the algorithm using the predictors and target.
model.fit(train_predictors, train_target)

#Record error from each cross-validation run
error.append(model.score(data[predictors].iloc[test,:], data[outcome].iloc[test]))
``````

print(âCross-Validation Score : %sâ % â{0:.3%}â.format(np.mean(error)))

#Fit the model again so that it can be refered outside the function:
model.fit(data[predictors],data[outcome])
cm=confusion_matrix(predictions,data[outcome])
print(cm)

1. Call the models one by one and get the metrics

outcome_var = âLoan_Statusâ
model = LogisticRegression()
predictor_var = [âCredit_Historyâ,âLoanAmount_logâ,âTotalIncome_logâ,âGenderâ,âMarriedâ,âEducationâ,âSelf_Employedâ,âProperty_Areaâ]
classification_model(model,data,predictor_var,outcome_var)

output is
Accuracy : 83.062%
Cross-Validation Score : 83.065%
[[ 95 7]
[ 97 415]]

7-ii

model = DecisionTreeClassifier()
classification_model(model,data,predictor_var,outcome_var)

7-iii

model = RandomForestClassifier(n_estimators=100)
classification_model(model,data,predictor_var,outcome_var)

Experiment with different features and with different models!!!

https://github.com/ml-ds-data/DataScience/blob/master/Loan-Prediction.ipynb

#13

#14

What I must do next I have tried a lot to remove this error, but not know exactly how will It gonna be solved

#15

#16

Finally It worked, Thankyou