Classification of 14 categories


Which method of classification is most effective to classify more than 2 categories.

I have a variable called “Occupation” which has 14 levels. I want to build a model to predict the missing values in that categorical variable.

Can RPART be used for this??



Yes, You can use rpart for the classification if the variable has more than one level.
for example
fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis)
Kyphosis - is the variable to be predicted.

There are numerous algorithms for classification as follows

  1. Decision tree(C5.0 and rpart)
  2. Naïve Bayes
  3. Linear Regression
  4. Logistic Regression
  5. Knn Classification
  6. Kmeans
  7. Random Forest
  8. SVM

Their performance depends according to the problem.

Hope this helps!