How to replace missing values in a particular column using knnImpute

knn
r
missing_values

#1

hello,

I am trying to use the knnImpute function to replace missing values in Age for the Titanic problem in R.

library(imputation)
kNNImpute(combi$Age, 3)

However this is giving me an error:

From the examples that I saw online it seems that the values are imputed for all the columns which have missing values,but how do I use this to replace missing values in only 1 particular column.

Can someone please help me with this??


#2

The error you are getting here is due to the fact that the number of centers which you have defined here i.e. 3 is higher than the distinct values of Age present in your dataset. Please check on that. This is a common error which we get in clustering.


#3

Moreover, I’m not sure, but shouldn’t you be providing the knn function something to predict your age with? Here, you are just passing the target variable, I think it would need some predictors too.


#4

You are doing this the wrong way. (that’s why you get an error)
You need to give the complete dataset to the function like this:

kNNImpute(combi, 3)

That is, because the function needs the other columns in order to estimate the missing values)

Yes, it will fill all missing values.
But this is no problem.

Just save the complete dataset with missing values before in say variable x
Then do the imputation and save results in y.
And then you can replace x$Age with y$age

Now you have what you wanted.