Error while implementing randomForest in R

r
random_forest
data_science

#1

Hello,

I a facing a problem while impementing a randomForest model in R.
I am getting an error saying->

Error in randomForest.default(m, y, …) : Can’t have empty classes in y.

randomforest was running fine before I added this to my code

dat=rbind(train,test)
dat$amount=0
dat$amount[dat$amount_tsh==0]=1

train=dat[1:59400,]
test=dat[59401:nrow(dat),]

Why is this error coming after using this code? What has changed in my train dataset after creating a new variable like this? How to get rid of it?

Thanks!


#2

I think there is an issue with the below code.

dat$amount[dat$amount_tsh==0]=1

can be written as dat$amount[which(dat$amount_tsh==0)] = 1


#3

Tried this too. No change. Still getting the error.


#4

Hi Aditya,

Try checking the level of data for the new column created in both training and testing dataset. This type of problem generally arrive when you create factor variables. Redefining the level can help you solve this problem.

Regards,
Aayush


#5

https://stat.ethz.ch/R-manual/R-devel/library/base/html/droplevels.html

Of course, before dropping unused levels, you must make sure that doing it doesn’t affect your model adversely.