Need Help in increasing accuracy 'No' Class

data_science

#1

I have a biased Data Set -

It contains more than 100 independent variables, with one dependent variable (Yes or No).

I have unbalanced classes, e.g. 98% ‘Yes’ labels vs 2% ‘No’ labels.

On Real Production : For New testing data , Accuracy for ‘Yes’ is : 99% and for ‘No’ it is 30%.

I have made my model on Logistic Regression.

With help of which statistical model/method , i can increase my accuracy of ‘No’ to 50%.

Any Suggestion will be appreciated!!!


#2

Hi @danidarshit

In case of an unbalanced dataset, there are various techniques like

  • oversampling
  • undersampling
  • cost sensitive learning, etc.

You can take help of the ROSE package for the problem if you are working in R…

The following article would be quite helpful for the same as well.

Best Regards,
Shashwat


#3

I have model in Python


#4

@jalFaizy Maybe you can help us out here.


#5

I have made model in python and any help with package? :frowning:


#6

Adding to @shashwat.2014 's answer, there’s a package available in python called [imbalanced-learn] (https://github.com/scikit-learn-contrib/imbalanced-learn). You should check it out!