I am working on a dataset which is skewed(10% Yes/90% No) and I tried to use the ROSE package to do oversampling and then do logistic regression on it but the accuracy hardly improved.
train.rose <- ROSE(Churn ~ ., data=train.data, seed=123)$data # Apply logistic: rose.logit <- glm(Churn ~ .,data = train.rose, family = "binomial") ROC(form = Churn ~ .,plot = c("sp","ROC"),PV = T,MX = T,MI = T,AUC = T,data = train.rose)
After this it is the usual prediction on test data and calculation of AUC metrics.
While the unbalanced dataset was giving me an roc area of 0.60 this is giving 0.61.
So I am not being able to understand how does oversampling actually help as it did not in this case.
Or am I going totally on the wrong track here and something else has to be done for oversampling the data.
Can someone please guide me on this.?