I am trying to solve a classification problem. Dependent variable is Claim Rejected(Yes/No)
There are various independent variables like age on claim,claim amount,education level,no of days policy commenced,premium, sum assured etc…I tried various algorithms but decision tree gave me best result.
In the data set total records are 72000 (Yes-5000,No-67000). Decision tree overall accuracy is 95%. However FP (False +ve) is too high which means most of the Yes are not predicted as Yes.
My accuracy alone for Yes is 43% (67% Yes are predicted as No) which I think is not good. Other algorithm I tried like logistic regression gave lower accuracy than decision tree.
Can you guide what I should do further. Or this is an indication that I must be missing some independent variable without which accuracy cannot be improved (for Yes).