How are decision trees not sensitive to Skewed distributions?




I don’t seem to understand the concept that decision trees are insensitive to Skewed distributions.
I read that this is because it is a non-parametric method. What do we mean by non-parametric method and how does this explain this property of decision trees?
Can someone explain this with an example?




Parametric methods are mainly based on the assumptions on the distribution of the data. They estimate a parameter (usually mean , sd ) from the sample data and is used in the modelling framework.
Point to ponder - Mean for a normal distribution will be different than mean for a right skewed distribution hence affecting how your model performs.

In Non Parametric methods no such feature of distribution is used for modelling. Primarily in Decision trees (say CART) it takes into account which variable/split brings in maximum difference in the two branches(eg - Gini) . In such a case , the distribution does not really matter.

Hope the above explanation helps.



To add to what @nayan1247 has said already, the basic principle for working of decision trees is to split each parent node in as distinct nodes as possible. In doing so, it does not make any assumption about the distribution of the original or the resultant population. Hence, the nature of the distribution would not matter in implementing decision trees.

Infact, ideal population for a binary decision tree is a bi-polar distribution (so that you can split it in 2 very different populations easily).

Hope this make it a bit more clear.