How to create an ensemble of Naive Bayes models?



I am working on an email classification problem with 4 classes. I used Naive Bayes algorithm for classification. The accuracy turns out to be decent (~65%) given that there is a some overlap in the classes (i.e. couple of terms are common among 2-3 classes). The constraint is that I cannot combine any of the classes together. I tried playing around a bit with a custom stop-word list, but acccuracy did not improve much.

Now, I want to check if I can improve the accuracy by ensembling multiple Naive Bayes model. I haven’t done ensembling before, so had a couple of queries:

  1. How to ensemble multiple Naive Bayes models? (Is it is using different Laplace estimators? )I looked up the web for the R codes, but was not successful. Any help with the R codes will be highly appreciated.

  2. I read that ensembling models of the same ML algorithm (NB, in this case) might not fetch significant improvement in accuracy as they are likely to be correlated to each other . Hence, only marginal improvement can be achieved. Any thoughts on this?

  3. Is ensembling a trial and error approach or is there a systematic way of doing it? For eg. how would I know which other ML algorithms (Random Forest/SVM etc.) would be best to ensemble with my initial Naive Bayes Model?



Hi @SD1,

This is a very interesting piece. If you dont mind, can you share the data, so that we have the 65% accuracy as a benchmark