How to decide what treatment to apply to missing values in our data?

missing_values

#1

Hello,

How should we decide which type of treatment to apply to missing values in our data? How to know whether to delete the data with missing values OR impute the missing values with mean/median/mode OR whether to build a predictive model to predict the missing values?
Is this dependent on data or do we just try out all of them and select the one which provides us with the best model?

Thanks.


#2

It depends on kind of data you are dealing with. For example for binary data in the form of Y/N you cannot do a mean.

Also for logistic regression you may drop a variable if it has more than 20% missing values.
Categorical variables can be imputed using mode and others by mean.
Less than 1% missing you can remove the observations.
For more than 10% missing you can use a regression model or use a proxy.

All these are ballpark figures and will depend on case to case basis.