How to decide when to use Logistic Regression & when Decision Trees?

# Logistic Regression vs Decision Trees

**shuvayan**#2

Hello @vajravi,

It usually varies from case to case but

There are some guidelines which can be followed:

1.If you want to find out how much the change in your explanatory variable affects your dependent variable you should use logistic regression.

2.Decision trees do not handle imbalanced classes(classification prob) well and hence you might consider logistic in that case.

3.Decision trees do not perform well when there is a lot of noise in your data and are susceptible to overfitting.

Having said that,decision trees are really a powerful tool which work great on simple and small datasets.You can also improve their performance considerably by using ensemble techniques like bagging,random forests, boosting.

Also they are helpful in understanding the underlying structure in your data.For example in the below diagram:

we can see that if a passenger is male and his age is > 9.5 yrs there is a 61% chance that he died.

So the gender of the person and age are taken together to make this decision.Such interactions are helpful and can be obtained from decision trees.

One more point which I would like to make is decision tree works well when the decision boundary is not linear:

whereas logistic regression works well when the decision boundary is linear:

So in short,if the data is small and you have reason to believe that the separation of the classes is non linear use decision trees.But beware of overfitting and use cross validation and pruning to get the optimal tree.

Hope this helps!!

A look into the Hackathon

**hinduja1234**#3

@vajravi - Adding to @shuvayan.

I would suggest you to start with the logistic regression because it is the simplest the model and will help you better understand the data.But the accuracy of the model depends on the data in which you are trying to use the model.

Hope this help!

Regards,

Hinduja