How do I select important variables for my model using Python sklearn?

dimensionreduction
python
sklearn
machine-learning

#1

I have a data set with quite many variables (attributes), I am wondering how I can select which attributes are significant in building the model, they are a mixture of categorical and numerical data.


#2

For numerical data you can run a multicollieanirty check and drop the variables which are highly coorelated. You can visualise this by using seaborn in python:

code:
import seaborn as sb
plt.subplots(figsize=(20,15))
sb.heatmap(X.corr(), annot = True, linewidths = 6, fmt= ā€˜.2gā€™)
plt.show()

X is the data set here.

Alternatively, you use pandas to pot the coorelation between variables and use method as spearman.

There is also a method called as Information value which determines the weightage of each dependent variable effecting your independent variable.

Hope this helps.

Thanks