Hi Guys,
Can you help me with missing value imputation in the Hackathon 3.x . Out of 87020 rows, some variables have as much as 59600 missing values.
Should I :-
- Drop these entire rows
- impute with 0
- Impute with median
- Run a model to predict the missing values
- Create a new variable, as flag to indicate missing or non-missing
This is a case where almost 70% of data is missing, i have thus far worked on around 1% missing data. I am absolutely clueless as to what must be done in this case.
Any help,guidance or insight would be much appreciated.
Thanks in advance