I have a dataset with around 120 features out of which around 70 are categorical and the rest are numerical. I 'm looking to perform EDA and select variables which seem to have enough predictive power. Each categorical variable has around 10 levels on average. This dataset contains a binary target variable which I have to predict.
How should I proceed: Select variables and then look at their characteristics? Wouldn’t that make the entire process biased?
Would it be a good method to separate numerical and categorical vars and then run separate variable selection algorithms?
In general, when there are a lot of variables, how is the data explored to gain insights about it?