Imbalance Dataset



How to deal with imbalance dataset in sklearn, R


Hi @erigits.
What I found normally that algorithms like decision trees give a good result for imbalanced datasets. Still if you want to use other algorithms on a skewed data, you could resample it before passing to the classifier. To do this you could,

  1. Oversample i.e. add multiple copies of data from less represented class
  2. Undersample i.e. delete elements from more represented class

Use UnbalancedDataset package for python and DMwR for R to deal with imbalanced datasets.


@jalFaizy, Thank you so much



just adding to the @jalFaizy 's response. Try using the function called “SMOTE” available in DMwR. It is a very good function which can make your training data set balanced by oversampling / undersampling.