How to improve the performance of random forest in R

r
kaggle
random_forest

#1

Hello,

I am using random forest for a classification problem.I currently have set the nodesize = 20 and ntree = 100.
The accuracy rate is ~95%.Increasing the nodesize to 50 and ntree to 200 makes the algo run for a long time and there is no marked increase in accuracy rate.
How do I improve the performance of the random forest.
This is for the digit recogniser problem in kaggle.


#2

Hi Shuvayan,

For specifically a image recognition problem you can do this -

  • Increase the number of data sets by the same dataset which you have, shift it a bit right, left, up and down usually 5-10 pixels works
  • Tilt image by 10 / 20 / 30 degree to both anticlockwise and clockwise and append it to your current dataset

After doing this you would have 5-10 times more data to train on, which will eventually improve your current model accuracy.

Hope this helps.

Regards,
Aayush


#3

Node size in Random Forest refers to the smallest node which can be split, so when you increase the node size , you will grow smalller trees, which means you will lose the previous predictive power.

Increasing tree size works the other way, It should increase the accuracy.

By increasing nodesize and tree size, you are putting two factors against one other, if you want to increase the predictive power, go for smaller node size and more trees.

Trees will be larger and go till end, also number of trees increase , so ensemble of them will perform better.