How to predict which users will purchase given the user's activities



Training and testing data have the below format with no outcome variable

user_id: A hash that uniquely identifies the user.

activity_date: The date of the activity

activity_type: The type of activity like click through, purchase, email open, form submit etc.

I am trying to build a model that predicts which user_id’s will make a purchase in the future
and, to score the test data from most likely to least likely to purchase.

data looks something like this,

And also,

(1) Describe which activity types are most useful in predicting
which user will purchase in the future.

(2) Get 1000 user_id’s that are most likely to convert.

I am just confused if it’s a classification or regression or both ?

Any thoughts or inputs on how can I get started ?



Hi @Kuber,

Your problem could be solved using recommendation system. You just need to collect user data, as you already have a unique id for every user. Read this blog on recommendation engine.



@jalFaizy: I never thought it’s a recommendation system problem as I had to predict which user_id’s will make a purchase in the future ?

Can you plz elaborate on this little more so that I can have a better understanding ?

Thanks once again :slight_smile:



Of the top of my head, I think you could follow these steps:

  • Collect user data, i.e. purchase patterns.
  • Compare these patterns across all the users
  • Now apply collaborative filtering (more specifically user-user collaborative filtering).
  • This algorithm will be trained on historical data of user-activities.
  • While testing, it will take in user details and give out predictions as how likely it is for the user to purchase.

The disadvantage of above method would be that it would not be “peronalised” according to users.

You could also try making this an item-item filtering prob, (i.e. if the user purchased the item before, he likely be purchasing it again).

Try exploring a few more ideas, read recommendation in depth and try implementing it on your data. Good luck!