Few Queries regarding CART and logistic algorithms and missing data



Hi Team,

I have few queries regarding ML algorithms and data, It would be great if you can provide some feedback on that.

  1. Which is best package to impute the missing data (currently using MICE in R) and how to deal with missing categorical values (e.g. self_work variable is significant in my model to predict loan and values are Yes/No and some of them are missing)

  2. How we can validate if imputed values are correct (any stats test or any test which we can perform on those columns to validate the values)

  3. If I have to predict binary output, how can I choose between Logistic regression, CART, Random forest. Do I need to build the model for each algorithm or is there any test which will help me to decide.
    Note: Logistic provides more details like variable significance etc and random forest gives accuracy and interpretability but what if my aim is accuracy. sometimes logistic maygive you better accuracy than CART/Random forest

Thanks in advance!