What are the ways to handle Missing values in a model

missing_values
data_wrangling

#1

I am new to Analytics domain and i frequently face issues related with missing values. For example, Titanic Survival problem of Kaggle, there is missing values for Age. Can you please suggest methods to deal with missing values?

Thx,
Imran


#2

@Imran Assign a value for the missing.

  • Assign a value indicating missing like -1
  • A very simple estimate could be one of average metrics-- mean or median. Average can be taken by a category also like gender.
  • Use a regression or simple model to predict the values of missing variables - User other features available from data set and taking data where value is present for variable (in this example age).

A reference which goes into details on the data imputation techniques - Missing-data imputation

Hope this helps.


#3

In addition to what @binukeloth has already mentioned, one of the common ways is to use the already present values and build a decision tree on the variable with you want to impute.

Once you have the segment, you can substitute the missing values with average for each segment.


#4

Hi Imran,

I think you will find this blog post really helpful for missing value treatment/ imputations -

Hope this helps.

Regards,
Aayush Agrawal