Can several(3-5) random forests be combined for better classification

r
random_forest

#1

Hello,

While using random forests I came across a function in the randomForest package in R called combine.This combines the results from 2 or more random forest models.

randomForest_1 = randomForest(label ~ ., data=combi, nodesize=5,ntree=500,do.trace = T)
randomForest_2 = randomForest(label ~ ., data=combi, nodesize=5,ntree=500,do.trace = T)
randomForest_3 = randomForest(label ~ ., data=combi, nodesize=5,ntree=500,do.trace = T)
randomForest_4 = randomForest(label ~ ., data=combi, nodesize=5,ntree=500,do.trace = T)
randomForest_5 = randomForest(label ~ ., data=combi, nodesize=5,ntree=500,do.trace = T)
rf.all <- combine(randomForest_1,randomForest_2,randomForest_3,randomForest_4,randomForest_5)

However,having so many random forests might give memory issues and also might be time extensive.Is this something that is generally done to improve random forests performance?Also how might we decide how many randomForests to generate??


#2

@pagal_guy,

I am not sure how combine works, but building ensemble models using random forest is a common practice. In order to improve accuracy, people run Random Forest with different seed values and then take averages or voting ensembles. This typically reduces the errors in prediction.

You can look at this ensembling guide for more details:

I hope combine does something similar, but I haven’t checked.

Regards,
Kunal


#3

@kunal

Only better model power, should be the main criteria for solving data science problem? Mostly, I have seen analysts are using GBM, Ensemble Modeling (Combining Multiple random forest and other algorithms) and multi layer approach to improve the power of model but think about a business scenario where you need to implement your model output. Does this type of model not come with implementation challenges?

Regards,
Imran


#4

@Imran - it does. So, it depends what you are doing this for. If you want to build a classifier for segmenting customers, which has to be implemented through a CRM, it is not a problem.

But, if you want to implement this model through people, I’ll never use a Random Forest.

However, if you are using one random forest (as was mentioned in the initial question), building 5 of them is no different!

Regards,
Kunal