Hi all ,
I am working on a classification problem. To classify whether a particular event happens or not.(Yes/No). My target variable is dominated by “No” and I have less “Yes”. That is “Yes” comprise of only 2% of the entire dataset. Hence the model performance is bad. However if i sample the dataset in such a way that “Yes” comprises of atleast 10% of the dataset, the model performance is good. I want to understand whether this approach is right or not.