Zero-frequency problem



I came across the Zero-frequency problem during the study of naive Bayes. But i am unable to understand what it is.



One of the disadvantage with Naive-Bayes is that if you have no occurrences of a class label and a certain attribute value together then the frequency-based probability estimate will be zero.

And this will get a zero when all the probabilities are multiplied.



The solution to the zero-frequency problem would be to shift your zero frequency to slightly higher. So, if you have counts of some occurrence in your data use Laplacian smoothing. Some of the examples can be found here:

Basically, use the laplace option in naiveBayes:

model <- naiveBayes(Class ~ ., data = HouseVotes84, laplace = 3)