Two different results on confusion matrix using random Forest on loan prediction data

r
confusion_matrix

#1

n<-as.data.frame(a.out$imputations[[i]])
s<-randomForest(Loan_Status~Gender+Married+Dependents+Education+Self_Employed+Property_Area+ApplicantIncome+CoapplicantIncome+LoanAmount+Loan_Amount_Term+Credit_History,data=n)
print(s)
plot(s)
testPred <- predict(s, newdata = n)
table(testPred, n$Loan_Status)
}

print(s) gives following result.

Confusion matrix:
N Y class.error
N 71 77 0.52027027
Y 18 314 0.05421687

However table(testPred, n$Loan_Status) gives following result.

testPred N Y
N 1 87 0
Y 5 422
Why is there difference in result?


#2

Hi @Surya1987,

Wow this is a surprise. Well I don’'t know what is the exact problem , ideally it should not happen. 2 questions -

  1. testPred are classes only right? Check once
  2. Can you sort on ID’s your test pred and Loan status and redo this thing once?

Hope this may solve the problem.

Regards,
Aayush


#3

testPred is a factor.
I will sort the data and check whether the results are equal. Thanks for your reply.


#4

This is a common error when using random forest. Basically, model training is done using approx. 2/3 portion of the total training dataset and remaining dataset used for OOB estimation. Random forest results in a confusion matrix based on its OOB prediction. If you send the training data back into the predictions you will get almost perfect prediction. Please see the below link for more details:


#5

Loan Prediction 3
I am trying to use Random forest -

rf1=randomForest(Loan_Status ~ .,data=trainloanav[,-c(1)],nodesize=25,ntree=25)

i am just removing the loan id field as it has more than 50 factors

pred=predict(rf1,newdata = testloanav,type = “response”)

However getting the following error :
Error in predict.randomForest(rf1, newdata = testloanav, type = “response”) :
Type of predictors in new data do not match that of the training data.

Can anyone help on understanding and resolving this error ?