What should be number of tree in random forest for correct predication

random_forest

#1

I am currently solving one classification problem using random forest algorithm but I want to know what should be number of tree to get accurate result.
str(train)
‘data.frame’: 891 obs. of 12 variables:
PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... Survived : int 0 1 1 1 0 0 0 0 1 1 …
Pclass : int 3 1 3 1 3 3 1 3 3 2 ... Name : Factor w/ 891 levels “Abbing, Mr. Anthony”,…: 109 191 358 277 16 559 520 629 417 581 …
Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ... Age : num 22 38 26 35 35 NA 54 2 27 14 …
SibSp : int 1 1 0 1 0 0 0 3 0 1 ... Parch : int 0 0 0 0 0 0 0 1 2 0 …
Ticket : Factor w/ 681 levels "110152","110413",..: 524 597 670 50 473 276 86 396 345 133 ... Fare : num 7.25 71.28 7.92 53.1 8.05 …
Cabin : Factor w/ 148 levels "","A10","A14",..: 1 83 1 57 1 1 131 1 1 1 ... Embarked : Factor w/ 4 levels “”,“C”,“Q”,“S”: 4 2 4 4 4 3 4 4 4 2 …


#2

@hinduja1234-
the ntree argument specifies how many trees we want to grow.
If you were working with a larger dataset you may want to reduce the number of trees, at least for initial exploration, or restrict the complexity of each tree using nodesize as well as reduce the number of rows sampled with sampsize.
and ideal ntree=2000 for this famous titanic problem

Hope this helps!
Regards,
harry


#3

@hinduja1234,

Generally, more the number of trees, better is the prediction. But due to the memory and processing limits of our personal systems we need to compromise between time and better predictions. So, even though it wouldn’t make much of a difference after a certain point, you should try to increase the number of trees to the limit where you can get your results in time.
And you can even specify other parameters in your model as specified by @harry in case of larger datasets.

Hope this helps.