Maximum levels that can be predicted by Naive bayes



I have a data containing many missing values in one variable called COUNTRY… It has 42 levels now (including the missing values)… I want to build a Naive bayes model to predict the unknown values…

So can the model handle 41 categories and predict successfully??

if you know any other good model then also please suggest…



Hi @Ashis,

It might help you to use the two level modelling. So in first level group the categories in ‘COUNTRY’ column and than predict the the grouped categories. With the help of these predicted values you can try to predict the target ‘COUNTRY’ in second level.



Hi @Ashi

You should provide a reproducible example to set the context. Sometimes, answers aren’t readily available unless we try. Also, you should specify which programming language are you using.

naiveBayes is a multiclassification algorithm. There would be no problem in using a variable with 41 levels. However, naiveBayes isn’t the best algorithm to impute missing values. The reason being, it makes “naive” assumptions of independence and importance in the data set, which aren’t always true. This parametric nature makes it less powerful than tree methods.

Instead, for missing value imputation, you should use a tree based method which is a lot more robust to missing values. If you are using R, you can get reference from here: