Random Forest -decide variable Importance

machine_learning

#1

Hi, I am using random forest for regression and getting different result for variable importance every time I run the same model. I am not able to decide which variable should I consider as most Important variable. Any help to understand this is appreciated.


#2

As the name suggests, Random forest are quite random because everytime they use bootstrap of data for training.

For your problem, you can set “random_state” parameter of random forest model to some arbitary value and then later use cross validation technique in order to get the best parameters.

Hope this helps.

Shubham


#3

Alternatively run a parallel regression model to do a variable selection and correlate with the random forest out put. I usually try to use regression to decide on important variables and try to build random forest for improving classifications.

There is no 1 answer to arriving at a solution that helps you to not only create the out put but make it robust for future data changes.


#4

Thank you guys! It did really help to come to the desired conclusion.


#5

Hi - If your are using R do look at package called Burota.
It is based on RF, but performs better than RF.