I have the data which having the actions which performing on my tool and want to predict the customers who are ready to convert from free/trail in to paid category .
data looks like below :
dummy<-data.frame(license=sample(c("Free","Trail","Paid"),10000,replace = T,prob = c(0.6,0.35,0.05)),
plan_type=sample(1:5,10000,replace=T),
action1=sample(0:100,10000,replace = T),
action2=sample(0:1000,10000,replace = T),
action3=sample(0:10,10000,replace = T),
num_days_in_product=sample(0:500,10000,replace = T))
table(dummy$license)
prop.table(table(dummy$license))
head(dummy)
license plan_type action1 action2 action3 num_days_in_product
1 Paid 1 100 71 5 285
2 Free 5 75 438 1 2
3 Free 1 5 555 7 389
4 Free 3 4 105 0 150
5 Free 1 16 348 7 423
6 Free 5 15 866 8 270
let me know if any extra information needed from my end
Hi @saisaranv,
The answer you are looking is itself a very big project. However, I can break down some instructions for you.
- Clean up your data and make sure you do not have any inconsistencies in data like missing values, inconsistent type, values , etc.
- Once the clean up of data is done, Go ahead exploratory data analysis where you try to find some patterns with regards to your categories you want to predict.
- Regarding the modelling you can start with decision trees model and check the accuracy of you model on test data.
- Tree would give you the most important factor which determine your class.
This is just one approach with one model. You can also try different models if time permits in your case.
Hope this helps. Happy predicting.
Regards,
Vikas Jangra
1 Like
Can I get the original dataset link to this question asked?
please follow this discussion thread for steps to apply a machine learning problem.
@saisaranv
This is a classification problem. You can do the following things
-
Use Logistic regression modelling to get the probability of conversion for each customer.
-
To visualize what features impact conversion, you can then use a decision tree model to see what set of feature(variable) combination drives conversion. This is especially good for presentations
-
If the task is to really build a prediction model with great accuracy, start by building a logistic regression model as a baseline and then improve upon it using boosting techniques such as gbm or xgboost