I am doing some data analysis using python and Pandas.
I have a basic question to ask.
Suppose we do the standard data cleaning in the training set like replacing NaN values with mean and performing label encoding and training a model on it.
Do we have to perform the same cleaning on the test set also??If not then how will our model recognize NaN values and missing values which might be in the test set?
If yes, Can we put all the cleaning steps inside a python function and apply the same function to the data of the training set so it is cleaned in one step?
Is my understanding fine or am I missing something?
Thanks in advance for the help.