Huge dataset with more than 30k obs


#1

There are 292 variables 30471 obs.I am not getting how to select the features and processing it.Is xgboost is good to choose.
Any help on this is appreciable


#2

Hi @prasadik
first 292 variables is not a lot, Xgboost remember will discard some variable due to regularisation with the L1 using alpha parameter use cross validation to verify the best alpha. (L1 remove the variable).
If you know you have non linear Boost is good, if linear and you can always test use glmnet.
Hope this help.
Best regards
Alain