Split Criteria in Random Forest


Hi ,

could anyone let me know the split criteria used in splitting the nodes in Random forest?
I have heard about information gain and Gini Coefficent.
It would be great, if some one could explain it.


Random Forest- Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest.

Information gain is the criteria by which we split the data into different nodes in a particular tree of the random forest.

As in the picture we can see that we have data with equal probability of each class and we can split the data into two different ways one is horizontally and one is vertical. In vertical case, we are getting more information of splitted data.In a similar manner, we can split any data into many different ways and then by checking information gain we can choose the best splitting way.Below there is also a mathematical formula of information gain.

Hope this helps!