Using SMOTE function to handle imbalanced data set

r
imbalanced
logistic_regression
smote

#1

I am working on a problem of loan default prediction for a financial risk assessment. I would like to know the good approach to use SMOTE function for handling the imbalanced dataset which originally has 6% default rate.

I have used the following code for Smoting

Minority Oversampling using SMOTE

training_sub <- as.data.frame(training_sub)
View(training_sub)
training_new <- SMOTE(SeriousDlqin2yrs~., training_sub, perc.over = 200, perc.under = 100)
View(training_new)
summary(training_new)

the SMOTED data gives 50% balanced data (50% - 0, 50% -1) and also changes the number of records.
But when I used this data, I get improvement in Sensitivity, with loss of accuracy for a Logistic Regression model.
Is there a way to increase the accuracy of the model?


#2

It is not a question of your code applying SMOTE.

I would not use class balance as a first step in your modeling process to improve accuracy.
There are many things you can do with your variables: binning, create new features, treat categories, etc.
And even considering interactions…

Have you already worked with your model in this way?..

Regards,
Carlos.