Missing data threshold


#1

Can you help me in determining how much percentage of missing data should be consideredered for imputation and more than that which should result in dropping that variable from analysis?

I know there is not exact answers to this question and that also depends number of variables in the data but we should still have some percentage figure in mind beyond that data should not be considered for imputation.


#2

Usually, If I have more than 50% data missing I will drop that Column


#3

There are several ways of dealing with missing values.
If a large percentage of data for a given variable is missing then we don’t use that variable for building the model.
If the percentage of missing values is small ( 5 to 10% ) then replace the missing values with either mean, median or mode.Impute the missing values from the relationship between the variables.
Use Data Mining with R (DMwR) library to impute the missing values using CentralImputation or KnnImputation.