Which algorithm to choose when the response is weak?

algorithms
machine_learning
python

#1

Hello,

I have a question.

For example if I have to build a classification algorithm with as target variable yes / no.
If the answer yes is under represented with respect to no, for example
10,000 -> NO 0.99%
100 -> YES 0.01%
Which algorithm is best for this example?

thank you in advance


#2

Hi @fred0715,

Every algorithm can deal with this problem. It’s specifically called a “class imbalance” in predictive modeling space. Please look at this article - https://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-imbalanced-classification-problems/

Generally Oversampling, Undersampling and changing probability threshold is the common techniques used to solve such cases, you can find more information in the blog.

Regards,
Aayush


#3

Hello,

Thank you for the speed and relevance of your answer.
Good article, thank you very much.
Do you have the same article for python