The basic steps to decide which algorithm to use will depend on a number of factors. Few factors which one can look for are listed below:
Number of examples in training set.
Dimensions of featured space.
Do we have correlated features?
Is overfitting a problem?
These are just few factors on which the selection of algorithm may depend. Once you have the answers for all these questions, you can move ahead to decide the algorithm.
The main reason to use an SVM instead is because the problem might not be linearly separable. In that case, we will have to use an SVM with a non linear kernel (e.g. RBF).
Another related reason to use SVMs is if you are in a highly dimensional space. For example, SVMs have been reported to work better for text classification.
But it requires a lot of time for training. So, it is not recommended when we have a large number of training examples.
- It is robust to noisy training data and is effective in case of large number of training examples.
But for this algorithm, we have to determine the value of parameter K (number of nearest neighbors) and the type of distance to be used. The computation time is also very much as we need to compute distance of each query instance to all training samples.
Random Forest is nothing more than a bunch of Decision Trees combined. They can handle categorical features very well.
This algorithm can handle high dimensional spaces as well as large number of training examples.
Random Forests can almost work out of the box and that is one reason why they are very popular.