I have a data set with quite many variables (attributes), I am wondering how I can select which attributes are significant in building the model, they are a mixture of categorical and numerical data.
For numerical data you can run a multicollieanirty check and drop the variables which are highly coorelated. You can visualise this by using seaborn in python:
import seaborn as sb
sb.heatmap(X.corr(), annot = True, linewidths = 6, fmt= ‘.2g’)
X is the data set here.
Alternatively, you use pandas to pot the coorelation between variables and use method as spearman.
There is also a method called as Information value which determines the weightage of each dependent variable effecting your independent variable.
Hope this helps.