I have data set to be classified into label A or B.
I trained NB and it’s reporting overall accuracy of 65 % on a test data with equal number of data sets for label A & B. Then I tested this on A & B individually, and the performance is 80% on A & 50% on B.
Is my model really correct 80% for label A?
I had divided my data in three sets, Training (60%), Cross Validation (20%) and Test Set (20%).
On Cross Validation and Test Set the performance is about 65% which probably indicates that the algorithm is generalizing fine on new examples.
But when I test is on test set containing data with label A only then the performance is 80% and similarly on test set of B the performance is 50%.
Can I assume that the model is 80% correct for label A and 50% correct for label B?
Or I consider false positive and say that for test set with label B the performance is 50% meaning, 50% are false positives (A) and hence when I’m getting 80% on test set with label A, I should reduce the false positives and the resulting accuracy for label A would be 80 x (100–50) / 100 = 40%
Meaning, if I use this algorithm to classify a new data and if the output is ‘A’ then I’m 40% sure that the output is right.