Missing Value Imputation (70% data missing)


Hi Guys,

Can you help me with missing value imputation in the Hackathon 3.x . Out of 87020 rows, some variables have as much as 59600 missing values.

Should I :-

  1. Drop these entire rows
  2. impute with 0
  3. Impute with median
  4. Run a model to predict the missing values
  5. Create a new variable, as flag to indicate missing or non-missing

This is a case where almost 70% of data is missing, i have thus far worked on around 1% missing data. I am absolutely clueless as to what must be done in this case.

Any help,guidance or insight would be much appreciated.

Thanks in advance


As a learning exercise, try doing all of them and see what results. Thankfully, the competition is open until Friday and with unlimited submissions. Please note that I’m not trying to discourage you from asking questions or seeking help, but as a fellow beginner I feel that this is a wonderful opportunities to hone our skills.