Hi all, to a certain extent I appreciate the machine learning algorithms aiming at making predictions based on past data fed into the model (like machine learning applied to fraud detection). However what remains glossed over on many machine learning courses is the practical way of making predictions once the model has been tested and saved.
In other words if we take, for instance, a supervised classification model where the outcome is known and the model scores high accuracy on the test data, how valuable is this model for new unseen data?
Let’s take one of the most common machine learning examples: “will my customer churn or not?” The dataset is made of observations already having past data for customers prior to churning (emails, calls made, purchases, etc.). Hence the question: if we already have past data from customers, what is the point of making predictions? And even if any sort of prediction can be made, how far in the future can these predictions be made? A good model should be able to predict the future, but if we already are in the future (where customers already made purchases, already contacted the cutomer service, etc.) what is the point? A good model should be able to predict, segment or assign new obsrvations to defined clusters without having all those predictors that typically are used to build the model in the first place… So, if any prediction can be made, how far reaching can it be?