I have been struggling with predicting 2 classes in a particular data set that i have. I am using python for that
So far i have tried linear and non linear models.
For the moment the best performing models are ensemble ones and neural networks. HOWEVER, i am unable to get a testing score above 0.60 AUC and about the same percentage of Accuracy.
Scaling the data doesn’t seem to help. Limiting the number of features doesn’t seem to help either. The data set is about 80k rows, 265 predictors and 1 outcome.
I also tried PCA, but then again the problem has no chance of being linear.
Then i used a random forest for feature importance. The highest importance that the data set has is around 0.04.
I need some tips on how to go about this for example with a neural network (i am using keras and tensorflow back end).
I’ve read a few guides on tuning different models like XGBoost and GBM, but they are really, really misleading as the author predicts and outputs a score on the training set, which makes 0 sense.
So… I would be grateful if you can provide me with some pointers in regards to keras, XGBoost and any other ensemble algorithm. Maybe i am not making the models large enough (estimators, neurons, layers). I don’t know what i am missing really.