Error in tuneRF

r
random_forest

#1

Code ::

 Empchurn_RF_tune <- tuneRF(x = Empchurn_RF.dev[,-c(9,10,22,27)], 
          y=as.factor(Empchurn_RF.dev$Attrition),
          mtryStart = 5, 
          ntreeTry=200, 
          stepFactor = 1.5, 
          improve = 0.0001, 
          trace=TRUE, 
          plot = TRUE,
          doBest = TRUE,
          nodesize = 30, 
          importance=TRUE
)

Output ::

mtry = 5  OOB error = 0% 
Searching left ...
mtry = 4 	OOB error = 0% 
NaN 1e-04 
Error in if (Improve > improve) { : missing value where TRUE/FALSE needed

Not sure what’s wrong with code, with other data set its working. I checked data does not have NA values


#2

Hi @Shubham26,

Though it would be just a calculated guess on my part since I do not have your dataset, but OOB error=0% might be a result of correlated variables. Try removing them from the model and applying it again.

If you can upload the dataset, it would be easier to diagnose the main issue.

Regards,
Shashwat


#3

Empchurn_RF_DEV.csv (343.1 KB)

Hi Shashwat,

Please find data attached. Also in RF do we have to take care of correlated variables, i know we have to do in Logistic regression.

Thanks


#4

Hi @Shubham26,

You need to remove the dependent variable ‘Attrition’ from the set of your independent variables. I tried it now and found that it is running without any errors using this.

Empchurn_RF_tune <- tuneRF(x = Empchurn_RF.dev[,-c(2,9,10,22,27)], 
                           y=as.factor(Empchurn_RF.dev$Attrition),
                           mtryStart = 5, 
                           ntreeTry=200, 
                           stepFactor = 1.5, 
                           improve = 0.0001, 
                           trace=TRUE, 
                           plot = TRUE,
                           doBest = TRUE,
                           nodesize = 30, 
                           importance=TRUE
) 

Use this, it works fine.

Regards,
Shashwat


#5

Thanks Shahwat…it works, yup i realized bit late that i am not excluding dependent variable from dataset


#6

Dear shashwat,

i got the same problem again with my dataset when tune the mtry of random forestT2_rf.csv (4.4 KB)
But I do not know where is the problem? could you please help me?