How does machine learning model works on raw data



I have always wondered about the following question though it might seem to be trivial.

In every machine learning model we do following:

  1. we either have train and test data separately or we divide the train and test data from one file; in both of these cases we perform clean>eda>preprocessing>feature engg etc on both train and test and then we build our model on train and test it on test data

my question is that we have performed all eda and feature engg on test data, but on deployment the model has to work on real time raw data which is unpreprocessed.

so how to check the efficiancy of our model and how to access our model



Not all but few of the transformations or feature generation that the test data goes through the same thing has to be applied to real-time raw data to achieve the desired results.

But if you do resampling for imbalanced data for a classification model, then you don’t need to do that in real time raw data.

Normally the model works based on the patterns of data with which it was trained.