For data imbalance problems, mostly handled in three steps.
- Over-sample the minority class.
- Under-sample the majority class.
- Synthesize new minority classes.
SMOTE (Synthetic Minority Over-sampling TEchnique) is coming under the third step. It’s the process of creating a new minority classes from the datasets.
The process in SMOTE is mentioned below.
SMOTE are available in R in the unbalanced package and in Python in the UnbalancedDataset package.
Limitation of SMOTE:
It can only generate examples within the body of available examples—never outside. Formally, SMOTE can only fill in the convex hull of existing minority examples, but not create new exterior regions of minority examples.
Refer - Learning from Imbalanced Classes by TOM FAWCETT for more understanding about imbalanced classes.