I am trying to predict customer churn in a telco company, using R.The dataset is very unbalanced, the target is around 0.6% of the base.
- 8,746 Customers will Churn
- 1,396,664 Customers do not churn
I have trained a Random Forest in R.Prior to training, I SMOTE the training data:
train.smote <- SMOTE(Churn~ ., train, perc.over = 100, perc.under=200
This gives me a 1:1 Balance. I then train the forest using:
When I run,
on my validation Data, I get the following Confusion Matrix:
Positive Negative Positive 1,136,610 234,625 Negative 3,762 5,911
The F Score is 0.83, the Specificity is 0.61.
However, the number of False Positives is too high (234,625).
Please suggest a method to curb these False Positives without compromising on the True Positives.