How do I use results from a random forest algo for prediction




I am participating in a Kaggle competition and have used random forests for prediction.How do I now do the prediction using the result of the random forest and how do I create the submission csv file??



This can be done by fitting this model to test data set and predict the outcome variables. Let’s understand it using most famous data science competition “Titanic Survival”.

Here after all the data exploration, variable creation and identification process, I am training a model using random forest:

fit <- randomForest(as.factor(Survived) ~ Pclass + Sex + Age + SibSp + Parch + Fare + Embarked + Title + FamilySize, data=train, importance=TRUE, ntree=2000)

Now, I will apply this model to test data set for predicting outcome variable and send the output to csv file. One key thing to note, before applying model to test data set you should have all the necessary variables created in test data set also. Often, analyst forget to create variable in test data set, those he/ she has created in train.

Prediction <- predict(fit, test)
submit <- data.frame(PassengerId = test$PassengerId, Survived = Prediction)
write.csv(submit, file = “firstforest.csv”, row.names = FALSE)

Hope this helps!



If you’re sure that you have correctly predicted the output, make sure that the column names are as required in the submission csv file. Also, make sure that you don’t forget the row.names = F in write.csv as mentioned in the above code by Imran.