Which one to use - RandomForest vs SVM vs KNN?

svm
knn
random_forest

#1

Hello,

For classification there are algorithms like random forest,KNN ,SVM and also Naive bayes.How do we decide which one to use.
Is the decision based on the particular problem at hand or the power of the algorithm.I have used random forest,naive bayes and KNN on the same problem and found that random forest performs better than the other two,but I would like to distinctions about when to use which.


#2

Hope, this link will help you with your question.
What are the advantages of the different classification algorithms


#3

Hi @shuvayan,

The basic steps to decide which algorithm to use will depend on a number of factors. Few factors which one can look for are listed below:

  • Number of examples in training set.

  • Dimensions of featured space.

  • Do we have correlated features?

  • Is overfitting a problem?

These are just few factors on which the selection of algorithm may depend. Once you have the answers for all these questions, you can move ahead to decide the algorithm.

SVM

  • The main reason to use an SVM instead is because the problem might not be linearly separable. In that case, we will have to use an SVM with a non linear kernel (e.g. RBF).

  • Another related reason to use SVMs is if you are in a highly dimensional space. For example, SVMs have been reported to work better for text classification.

But it requires a lot of time for training. So, it is not recommended when we have a large number of training examples.

kNN

  • It is robust to noisy training data and is effective in case of large number of training examples.

But for this algorithm, we have to determine the value of parameter K (number of nearest neighbors) and the type of distance to be used. The computation time is also very much as we need to compute distance of each query instance to all training samples.

Random Forest

  • Random Forest is nothing more than a bunch of Decision Trees combined. They can handle categorical features very well.

  • This algorithm can handle high dimensional spaces as well as large number of training examples.

Random Forests can almost work out of the box and that is one reason why they are very popular.