Do you mind sharing your code?
I too used Amelia package for missing value imputation in Loan Prediction. Since choosing ad-hoc method such as mean imputation can lead to serious biases in variances and covariance. In this case, instead of building separate model for each data set, I chose to impute the missing value in my train data set, with average of imputed values.
For example: Credit_History when gets imputed by Amelia and provide 5 different data sets. Just, average the imputed value and use it in your train data set. This will save you time. You no longer need to use different algorithms on the new data sets one by one.
Alternatively, you can also use missForest package for missing value imputation. I found it quite robust and better than amelia. missForest uses random forest trained on observed values of a data matrix to predict missing values. Like Amelia, it can be run in parallel to save computation time.
Here’s the code you can use:
missForest(data, maxiter = 10, ntree = 100, decreasing = FALSE,
mtry = floor(sqrt(ncol(xmis))), replace = TRUE,
parallelize = c('forests'))
data - it’s a data matrix with missing values
maxiter - the maximum number of iterations to be performed. Default is 10
ntrees - number of trees to grow in each forest
mtry - no. of variables to be sampled at each node
replace - if ‘TRUE’ leads to bootstrap sampling. If ‘FALSE’ leads to sub-sampling (without replacement). It should be TRUE
parallelize - activates parallel processing. You should ‘forest’ because this data set does not have many variables. Had there been many variables, you should have used ‘variables’ instead of ‘forest’.