Missing value imputation



while using mice/missForest package for big mart sales data set to impute missing values ,its taking lot of time to compile(may be because of more levels in categorical variables ),so how to do it in less time or is there any other better method?


Talking of faster imputation methods:

  1. For numerical values, simply impute means or medians : df$m[is.na(df$m)]<-median(df$m,na.rm=T).
  2. For Catagorical values, impute with most frequently occouring level.

I can’t comment about whether this will be of much help in the model or not but its certainly fast.
Hope it helps. :slight_smile:


thanks for answering :slight_smile: @sauravkaushik8
using mice/Hmisc/missforest i think more apt values can be imputed.
so i would like to know more about them


Hard to help you without knowing what your dataset looks like and what functions of mice you actually used…

I personally can recommend the following functions of the VIM package: irmi() and kNN()
In general best for imputation are these packages: VIM, mice, AMELIA and imputeTS ( for time series)

But if a function doesn’t finish at all, could be because categorical variables (factors) are not supported.