Initially I just used classification algorithms to calculate churn-out. But it can predict only whether a customer is going to churn out or not, it cannot predict when the customer is going to churn out or the probability of surviving on a particular future date.
Then I used survival analysis to predict churn-out. I made a cox proportionality hazard model using
coxph function from
survival package in R (I used all data till 1st August). With that model, I used
predictSurvProb function from
pec package in R to calculate probability of churning out of all non-churned customers (as on 1st August) on 10th August.
And by using threshold of 0.5 on probabilities to tell whether a customer has churned out or not, I got the following results -
84.28 % - For all customers (who were not churned out on 1st august)
25.72 % - For customers who actually churned out between 1st to 10th August
Then I checked why I am getting so low accuracy on churned-out customers, I found out that actually there were-
27000 - number of customers who were not churned out as on 1st August
312 - number of members who churned out of the 27000 members above between 1st to 10th August
So definitely there is class imbalance problem. So how do I increase my prediction accuracy on customers who actually churn out?
And if possible how can I use undersampling,oversampling, SMOTE, etc with survival analysis for my problem? https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/