Class Imbalance

machine_learning

#1

Incase of class imbalance how can we say that the accuracy is more baised towards a specific class?
Correct me if I am wrong:-
Suppose there exist a two classes for a target class. Accuracy is calculated based on

Accuracy= TP+TN/(TP+FN+TN+FP)
So either TP or TN/(TP+FN+TN+FP) we can get to know the accuracy.


#2

Hi @swarup17,

Consider the following case:
We have a dataset of identifying cancer for a total of 1000 people. The dataset looks like:

Total Observations = 1000
Cancer = 20
No Cancer = 980

If we predict No Cancer for all the people, we will get an accuracy of around 98%. But We cannot neglect those 20 people who have Cancer. So, in such case Accuracy is not a good measure of identifying the model performance. And hence we use some other evaluation metrics like Precision, Recall, etc.


#3

I would recommend take a look at F1 score metric, that fits your needs well.


#4

I am focused more on how can we get to know the class is baised towards a specific class.


#5

Hi @swarup17,

Class will not be biased towards any specific class. It is just that even if you get a very high accuracy (98% in the above example), the model will not be a good one as it is not able to detect the class with low frequency.


#6

Hi @swarup17,

Suppose you have an unbalanced data with 80 percent 1 and rest 20 percent 0. Usually when we fit a model like logistic regression or random forest on such a dataset, there are high chances that the model is biased. These models might predict 1 for every data point and will still be correct 80% of the times. We can say that the models are biased towards the majority class. I hope this answers you question