# How to Manually Calculate TextBlob's Naive Bayes Prob_Classify function

#1

Hi everyone, I am very new to this field and I don’t have a well background. I am using Text Blob’s built-in classifier for multi-classes text classification. I think Text Blob is using NLTK classifier. I am using prob_classify to find the probability of each class. Everything is working fine, but when I am trying to manually calculate the frequency table and then use the Naive Bayes formula to get the probability of each class I get different numbers from what I get from prob_Classify function.

P(x|y) = ( P(y|x) P(x) ) / P(y)

I used the above formula and then I tried to add log to the probabilities but my result still doesn’t match the Text Blob classifier. I’ve read the code of the prob_classify function. But, unfortunately I didn’t get the idea.

#2

@shubham.jain what do you think?

#3

Hi @jimbo1985,

If you check out the official documentation, here is the code for the function

def prob_classify(self, featureset):
# Discard any feature names that we've never seen before.
# Otherwise, we'll just assign a probability of 0 to
# everything.
featureset = featureset.copy()
for fname in list(featureset.keys()):
for label in self._labels:
if (label, fname) in self._feature_probdist:
break
else:
#print 'Ignoring unseen feature %s' % fname
del featureset[fname]

# Find the log probabilty of each label, given the features.
# Start with the log probability of the label itself.
logprob = {}
for label in self._labels:
logprob[label] = self._label_probdist.logprob(label)

# Then add in the log probability of features given labels.
for label in self._labels:
for (fname, fval) in featureset.items():
if (label, fname) in self._feature_probdist:
feature_probs = self._feature_probdist[label, fname]
logprob[label] += feature_probs.logprob(fval)
else:
# nb: This case will never come up if the
# classifier was created by
# NaiveBayesClassifier.train().
logprob[label] += sum_logs([]) # = -INF.

return DictionaryProbDist(logprob, normalize=True, log=True)

#4

Hi @jalFaizy,
Thanks for your concern. I have seen this code before but I still didn’t get the correct answer. I think something missing in my calculations.