I have few queries regarding ML algorithms and data, It would be great if you can provide some feedback on that.
- Which is best package to impute the missing data (currently using MICE in R) and how to deal with missing categorical values (e.g. self_work variable is significant in my model to predict loan and values are Yes/No and some of them are missing)
- How we can validate if imputed values are correct (any stats test or any test which we can perform on those columns to validate the values)
- If I have to predict binary output, how can I choose between Logistic regression, CART, Random forest. Do I need to build the model for each algorithm or is there any test which will help me to decide.
Note: Logistic provides more details like variable significance etc and random forest gives accuracy and interpretability but what if my aim is accuracy. sometimes logistic maygive you better accuracy than CART/Random forest
Thanks in advance!