How to handle missing values of categorical variables?

categorical
missing_values
data_wrangling

#1

Hi,

In case of missing values for continuous variables, we perform following steps to handle it.

  1. Ignore these observations
  2. Replace with general average
  3. Replace with similar type of averages
  4. Build model to predict missing values

Can you suggest me the methods to handle missing values if data is binary (1/0 or M/F) or categorical variables.

Regards,
Imran


How are you guys filling in the missing data? - Loan prediciton
#2

@Imran

There is various ways to handle missing values of categorical ways.

  1. Ignore observations of missing values if we are dealing with large data sets and less number of records has missing values
  2. Ignore variable, if it is not significant
  3. Develop model to predict missing values
  4. Treat missing data as just another category

Regards,
Steve


#3

Imran,
The same steps apply for a categorical variable as well.

  1. Ignore observation
  2. Replace by most frequent value
  3. Replace using an algorithm like KNN using the neighbours.
  4. Predict the observation using a multiclass predictor.

Hope this helps.
Tavish


#4

You can also look at this article:


#5

Generalised Low rank models can generate missing values by themselves. You can have a look at -

http://learn.h2o.ai/content/tutorials/glrm/glrm-tutorial.html


#6

Hi @arpitqw
thanks to share the Stanford paper great chapter 5
Alain