Acceptance criteria for % of Missing values

machine_learning

#1

Is there any specific cutoff / limit /percentage for the acceptance of missing values in a particular variable in a dataset?
Ex : If 70% of the values are missing in a variable, what action should we take… delete the variable according to its importance in the model or impute with different methods???

In my case, i have a dataset with 80 variables and out of that 10 variables have more than 50% of values missing.

Could anyone suggest a correct approach for this problem.


#2

it depends as there is no certain answer to your query. Here you mentioned that you have 80 variables & 10 have 50% missing values. i think here it would be better to remove such variables because you still have lot of variables.

you gotta to try different approaches & test it on your holdout dataset to find out which one works better. Here industry knowledge also plays a part. if you think missing values variable is significant variable, you may skip missing value observation if you have enough number of observations.

Be confident about your industry knowledge, you are better judge to take decision such as variable deletion or observations deletion or imputing.