How to impute categorical missing values in python?

python

#1

Referring here How to impute categorical missing values? how do you do imputing in python?


#2

@JalFaizy,

Do you want to know the methods to impute categorical missing values?

Hope this helps!

Regards,
Sunil


#3

hello @jalFaizy,

Missing values in categorical variables can be treated by:
1.Assign them a separate category.All missing values will be treated as a separate category.
2.Find out their distribution by grouping some variables.For example if you want to impute missing values in Gender,you can group by say Age_Bucket,Income_bucket etc and then see the distribution of gender within each group.Then assign the mode value as the value for missing.
3.You can use classification algo’s like random forest,knn etc to impute missing values.

You can use https://pypi.python.org/pypi/fancyimpute/0.0.4 and http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html in python for imputing missing values.
Also fields like LOAN_ID should not be imputed because they identify each record individually and keeping them in your model don’t really mean anything,I mean you cannot really differentiate between two people by their loan_id’s.
One more approach is to replace all missing values in your data by something like -1/-999 etc.
This makes the algorithm treat all missing values as a separate category and generally improves prediction performance.
Hope this helps!!


#4

Yes I wanted to know methods of imputing values.

I read your article which was a good intro to methods to treat missing values. Can you use these methods to impute categorical values?


#5

Thanks for the answer. It was really helpful