How to Manually Calculate TextBlob's Naive Bayes Prob_Classify function

multi-class
python

#1

Hi everyone, I am very new to this field and I don’t have a well background. I am using Text Blob’s built-in classifier for multi-classes text classification. I think Text Blob is using NLTK classifier. I am using prob_classify to find the probability of each class. Everything is working fine, but when I am trying to manually calculate the frequency table and then use the Naive Bayes formula to get the probability of each class I get different numbers from what I get from prob_Classify function.

P(x|y) = ( P(y|x) P(x) ) / P(y)

I used the above formula and then I tried to add log to the probabilities but my result still doesn’t match the Text Blob classifier. I’ve read the code of the prob_classify function. But, unfortunately I didn’t get the idea.


#2

@shubham.jain what do you think?


#3

Hi @jimbo1985,

If you check out the official documentation, here is the code for the function

def prob_classify(self, featureset):
    # Discard any feature names that we've never seen before.
    # Otherwise, we'll just assign a probability of 0 to
    # everything.
    featureset = featureset.copy()
    for fname in list(featureset.keys()):
        for label in self._labels:
            if (label, fname) in self._feature_probdist:
                break
        else:
            #print 'Ignoring unseen feature %s' % fname
            del featureset[fname]

    # Find the log probabilty of each label, given the features.
    # Start with the log probability of the label itself.
    logprob = {}
    for label in self._labels:
        logprob[label] = self._label_probdist.logprob(label)

    # Then add in the log probability of features given labels.
    for label in self._labels:
        for (fname, fval) in featureset.items():
            if (label, fname) in self._feature_probdist:
                feature_probs = self._feature_probdist[label, fname]
                logprob[label] += feature_probs.logprob(fval)
            else:
                # nb: This case will never come up if the
                # classifier was created by
                # NaiveBayesClassifier.train().
                logprob[label] += sum_logs([]) # = -INF.

    return DictionaryProbDist(logprob, normalize=True, log=True)

#4

Hi @jalFaizy,
Thanks for your concern. I have seen this code before but I still didn’t get the correct answer. I think something missing in my calculations.