Initially I just used **classification** algorithms to calculate churn-out. But it **can** predict only whether a customer is going to churn out or not, it **cannot** predict **when** the customer is going to churn out or the probability of surviving on a particular future date.

Then I used **survival analysis** to predict churn-out. I made a **cox proportionality hazard model** using `coxph`

function from `survival`

package in R (I used all data till 1st August). With that model, I used `predictSurvProb`

function from `pec`

package in R to calculate probability of churning out of all non-churned customers (as on 1st August) on 10th August.

And by using **threshold of 0.5** on probabilities to tell whether a customer has churned out or not, I got the following results -

## Prediction Accuracies

**84.28 %** - For all customers (who were not churned out on 1st august)

**25.72 %** - For customers who actually churned out between 1st to 10th August

Then I checked why I am getting so low accuracy on churned-out customers, I found out that actually there were-

**27000** - number of customers who were not churned out as on 1st August

**312** - number of members who churned out of the 27000 members above between 1st to 10th August

So definitely there is **class imbalance** problem. So how do I increase my prediction accuracy on customers who actually churn out?

And if possible how can I use undersampling,oversampling, SMOTE, etc with survival analysis for my problem? https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/