Hi, I am using random forest for regression and getting different result for variable importance every time I run the same model. I am not able to decide which variable should I consider as most Important variable. Any help to understand this is appreciated.
As the name suggests, Random forest are quite random because everytime they use bootstrap of data for training.
For your problem, you can set “random_state” parameter of random forest model to some arbitary value and then later use cross validation technique in order to get the best parameters.
Hope this helps.
Alternatively run a parallel regression model to do a variable selection and correlate with the random forest out put. I usually try to use regression to decide on important variables and try to build random forest for improving classifications.
There is no 1 answer to arriving at a solution that helps you to not only create the out put but make it robust for future data changes.
Thank you guys! It did really help to come to the desired conclusion.
Hi - If your are using R do look at package called Burota.
It is based on RF, but performs better than RF.