What is a good order to impute dataset having NA in multiple feature without using package?



Suppose I have a dataset consists of 8 features except target feature. Each have different number of missing value. Which feature should I impute first? Feature with lowest missing value or in the contrast? Is it make different? In case I use machine learning like decision tree or linear regression to impute missing value can I use feature with NA to impute another feature.


Hi @Geru_San,

You can follow these steps,

  • First, if you can logically impute missing values in any feature, do that.
  • If there are missing values still in every feature, drop the rows specifically which are missing in lowest missing values column and repeat step 1 again.
  • Then, use methods mentioned here for imputing missing values

For your last question, yes you can use ML methods.