How to decide which algorithm to use for a given dataset?

machine_learning

#1

Suppose I have a classification problem. How do I know when to use which algorithm to use like whether to use cat boost, lightgbm, xgboost, random forest, SVM etc. If my dataset has 50% categorical data and 50% continuous data. I am not considering the training speed. Is there any thumb rule to follow?


#2

Hi @Shrikantai,

It completely depends on your dataset as to which algorithm would work best. For instance, if it has a huge number of categorical variables, you might want to go for CatBoost or when the dataset is too large, LightGBM is expected to show a good performance. You can choose XGBoost for imbalanced dataset.

You can read about the algorithms and find out which algorithm fits well with the distribution of your data.


#3

Thank very much @AishwaryaSingh