Feature Selection



How does correlation between different features affect our approach of feature selection ?
I have seen many problems having features with high correlation amongst them. How to handle such features ?



In my opinion, you can either go for regularization techniques or boosting methods like xgboost or lightgbm, as they reduce the effect of correlation to a great extent.



If the correlation is high, you can use any one of the highly correlated variables but not both.

you can also a tree-based model, as they generally take care of correlated features effectivly.


@gurchetan1000 The scope for the question is very wide. The answer would depend heavily on what you want to do, what type of variables (continuous, categorical etc.) you are dealing with and what kind of dataset you have. Most importantly, what kind of compromises you are willing to make.

Any broad-based response would be guesswork.