I’m new to analytics, currently I’m working on a dataset which has more than 100 variables and I having checking out the different possible methods to reduce it to input it to the regression model. While searching I have come decision trees to be one of the best methods for variable reduction. My predictor variable or independent variables are categorical and my dependent variable is a continuous variable. I would like to know whether decision trees work on this scenario, if so I would want to know the logic behind it or if not is there any other variable reduction technique for best possible prediction. In decision trees how do we decide the cutoff and variables…?
Thanks in advance.
For variable reduction (or feature reduction), PCA is probably the best method. Refer this article
As your independent variables are categorical and dependent variable is continuous you can use use random forest to reduce variables. You can simply take the variables which are important for predicting dependent variable using rf. You can use the rule of association to find if there is any relations between the predictor variables i.e if var a and var b implies var c most of the times then you can omit var c. But predicting a continuous variable by all categorical variable generally cause a high error rate.