Missing Values Imputation



I was reading a report where the limitations of general imputation methods were given.

It read: The common methods results in under estimation of standard error and overestimation of test statistics.

Can somebody explain me this with an example and what effect will it have specifically on predictive modeling.

Any help would be greatly appreciated.


First this article mentioned two methods, there are others methods. In both case you come to one median or mean for the missing value, which are test statistics therefore as you do not know the real (hat) distribution you twist your resulting distribution toward the mean and therefore reduce the sd compare to the unknown one (how all the missing will be exactly equal to the mean ??)

Hope this help