A random forest model, random samples from the variables taken up in the model are used to form decision trees so that they can reduce the problem of overfitting by averaging out the effect of all the variables. So it is a kind of ensemble tool. So ideally, shouldn’t all the variables(excluding some like serial no.) in our data be put into the model to get a better average and accuracy in the model?
It has happens many times that adding a variable reduces the power of randomForest model. Why does it happen even after the averaging effect of the randomForest algorithm?