Spliting data for random forest



My dataset have some variables with factor level more than 30. When running my model for prediction I am getting the following error in R.

modelRF1 <- mlr::train(tunedRFmodel, trainTask)
#Prediction on test data
predictTest <- predict(modelRF1, testTask)

Error in predict.randomForest(.model$learner.model, newdata = .newdata, :
New factor levels not present in the training data
I am using mlr package in R.

My question is that is there a proper way to split our data so that we get same level of factors in both train and test data.


Hi @rohit.haritash

You can use createDataPartition( ) function from the caret package to split your data in a stratified manner.