I am trying to resolve a classification problem using decision tree or random forest.
Amount of data is good and the attributes are majorly categorical in nature and one is continuous. There are too many categories for most of the attributes. And I am unable to get more than 60% accuracy, which seems to be less. I am stuck with the problem on how to increase the accuracy so much so that it doesn’t overfit.
If anybody has any suggestions, kindly help.
Also, would like to know how creation of dummy variables for each category of an attribute for all attributes increase the accuracy. ( Read it on internet )