Difference in Random Forest via CARET and randomForest package


#1

Hi,

I am getting different results while running random forest through CARET and random forest package. I find that basic random forest outperforms random forest through CARET (method=“rf”) in the test dataset.Also, do we need to perform cross validation while using basic random forest package?

Please help.

Regards
Balaji SR


#2

@BALAJI_SR

Although, I am not an expert in R, caret is actually just a wrapper around existing algorithms. Hence, I don’t expect the results to be different in normal scenario.

However, in case of random forest, the output is created based on a random seed, which could be different in the two scenarios and hence the difference in result.

If is usually a good idea to perform a cross validation, but it depends on the cost of taking a wrong decision. If it is high, then it is a must. If it is not high, then you can also skip it in a few cases, but tune the model through parameters of RF to reduce overfitting.

Hope this helps. @ajay_ohri might be able to provide more details on this.

Kunal


#3

Yes, @kunal is absolutely right, you need to set a random seed to ensure a reproducible model. As the name suggests, Random Forests randomly picks up random observations and random sets of predictor variables to build different trees. If you don’t specify a seed value, the random draws will be different each time you run the model.


#4

Hi,

Thanks. I tried seeding but still basic random forest outperforms caret random forest.I will revert if I come up with anything.

Regards
Balaji SR