How to decide no of ntrees in randomForest?

randomforest

#1

I was trying to run a randomForest code in R…So there’s a paramter ntree. In kaggle forum i saw mny people taking random values for ntree like 50,100,500. So on what basis are they selecting these values ?

@Lesaffrea @shuvayan @hinduja1234

Thanks,
Rohit


#2

Hi Rohit,

well the number of trees is used to reduce the variance, trees have tendency to be unstable, to have high variance. Now there is a catch there low variance/high bias, so the art is to find the balance. I think you mentioned caret in other post with grid parameter in caret you can train multiple model and use the graphic of caret to check the model with the optimum number of trees.
With random forrest you have the trees as well as one parameter, from my experience it has more importance than number of trees.
Best regards
Alain


#3

But in randomforest on what basis we give any value in tree parameter?

I am beginner in this area. thats why so many silly doubts :frowning:


#4

Hi Rohit,

Ideally the number of trees should be the number of trees after which your Out of Bag (OOB) error starts increasing (ideal point). You can observe this using a parameter called do.trace = TRUE in R which will give you the OOB error after each tree is grown. (Let’s hope you’re using the randomForest package for this).

As a beginner, you can just set this number to a high enough value, say 500 if your data is big enough and observe the OOB errors. Having more number of trees does not cause any harm in the case of R at least since it limits averaging of predictions to the ideal point as described above.

Once you’ve run enough models with your data, you will get a fair idea of how much should be good enough.