What are the ways to handle Missing values in a model



I am new to Analytics domain and i frequently face issues related with missing values. For example, Titanic Survival problem of Kaggle, there is missing values for Age. Can you please suggest methods to deal with missing values?



@Imran Assign a value for the missing.

  • Assign a value indicating missing like -1
  • A very simple estimate could be one of average metrics-- mean or median. Average can be taken by a category also like gender.
  • Use a regression or simple model to predict the values of missing variables - User other features available from data set and taking data where value is present for variable (in this example age).

A reference which goes into details on the data imputation techniques - Missing-data imputation

Hope this helps.


In addition to what @binukeloth has already mentioned, one of the common ways is to use the already present values and build a decision tree on the variable with you want to impute.

Once you have the segment, you can substitute the missing values with average for each segment.


Hi Imran,

I think you will find this blog post really helpful for missing value treatment/ imputations -

Hope this helps.

Aayush Agrawal