Interpretation of performance of Naive Bayes Algorithm for binary classification



I have data set to be classified into label A or B.
I trained NB and it’s reporting overall accuracy of 65 % on a test data with equal number of data sets for label A & B. Then I tested this on A & B individually, and the performance is 80% on A & 50% on B.

Is my model really correct 80% for label A?

I had divided my data in three sets, Training (60%), Cross Validation (20%) and Test Set (20%).
On Cross Validation and Test Set the performance is about 65% which probably indicates that the algorithm is generalizing fine on new examples.
But when I test is on test set containing data with label A only then the performance is 80% and similarly on test set of B the performance is 50%.

Can I assume that the model is 80% correct for label A and 50% correct for label B?

Or I consider false positive and say that for test set with label B the performance is 50% meaning, 50% are false positives (A) and hence when I’m getting 80% on test set with label A, I should reduce the false positives and the resulting accuracy for label A would be 80 x (100–50) / 100 = 40%

Meaning, if I use this algorithm to classify a new data and if the output is ‘A’ then I’m 40% sure that the output is right.