Validation in Random forest and Interpretation of OOB and error estimate

r
machine_learning
random_forest
randomforest

#1

Hi ,
This is the summary of model which i built I am unable to understand how well my model is and would like to know what OOB means and its significance and I would like to know how i can say that my model is good in predicting the outcome

Call:
randomForest(formula = Loan_Status ~ Dependents + ApplicantIncome + CoapplicantIncome + LoanAmount + Credit_History + Property_Area + NC, data = train_data)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2

    OOB estimate of  error rate: 18.89%

Confusion matrix:
N Y class.error
N 93 99 0.51562500
Y 17 405 0.04028436


#2

Hi @Sunil0108,

Random forest is supported with bootstrap resampling, which in simple words is creating a random sample ‘with replacement’. For each sample S, we get a tree T. In your case, since we have created 500 trees, we have got 500 bootstrap samples with us, one for each tree. These samples would consist of m(sqrt(no. of features)) features taken at a time from the data and creates a tree with them.

Out-of-bag error:

For each bootstrap sample, there is one third of data which was not used in the creation of the tree, i.e., it was out of the sample. This data is referred to as out of bag data. In order to get an unbiased measure of the accuracy of the model over test data, out of bag error is used. The out of bag data is passed for each tree is passed through that tree and the outputs are aggregated to give out of bag error. This percentage error is quite effective in estimating the error in the testing set and does not require further cross validation.

Regards,
Shashwat


#3

Hi @shashwat.2014

Thanks for you valuable inputs …I am not able to figure out how can I say my model is good …and how can i compare with other models ?