Suppose we do this by calculating the Entropy as -(P+)log(P+) - P(-)log(P-) and comparing the uncertainty, then what if we encounter a pure set like 4Yes/0No. In that case the second term in the above expression would be 0*(-Infinity). So do we have to assume it as 0 in that case?

# Calculating Information gain in Decision Trees while choosing which attribute to split on

**NSS**#2

Hi there,

There are two things which needs to be focused on. * Entropy* and

*.*

**Information Gain**** Entropy** is the measure of the

*impurity*in a dataset. So for a pure dataset with 4 Yes and 0 No, the entropy (impurity) is 0.

Whereas what we consider while making splits in models like decision trees is the * Information Gain*.i.e change in the entropy after a split. Higher the â€śEntropy changeâ€ť (Entropy approaches 0 or pure form), the better the split. When the entropy reaches zero, like in the example you mentioned, there can not be further more

*useful*splits.

Hope I made things clear.

Regards

Neeraj

**crisis1.08**#3

Suppose I have to choose been two attributes where one attribute has a subset of 4 Yes/0 No and other attribute has one of the subset as 0Yes/7 No. Now both are pure, so how do we decide which attribute to use for splitting?

**NSS**#4

What you are trying to do is compare between two independent splits which cannot happen under any circumstance. If it is a binary problem, then it Yes and No will always complement each other. It is better to understand with the help of an example.

Suppose initially you had 11 targets with 4 Yes and 7 No. You have two features to split on , namely , *feature 1* and *feature 2*. Letâ€™s work out the cases.

Case 1: Split on feature 1 : 3 yes and 4 No in *left node* and 1 yes and 3 No in *right node*.

Inference: This is to say that feature 1 was not able to homogenously split our dataset.

Case 2: Split on feature 2: 4 Yes and 0 No in *left node* and 0 Yes and 7 No in *right node*.

Inference: This is to say that feature 2 was successfully able to separate the two classes effectively.

The thing to focus on is that you do not see the Entropy of a single sub-node but the * weighted sum* of all the sub-nodes of a node in a particular dataset.

And comparing two datasets in this is pointless anyhow.

Hope this made things clear.

Regards

Neeraj

**crisis1.08**#5

Change your example to :

Case 1: Split on feature 1 : 3 yes and 0 No in left node and 1 yes and 7 No in right node.

Case 2: Split on feature 2 : 4 Yes and 1 No in left node and 0 Yes and 6 No in right node.

and it turns out exactly what my question is.

Both feature 1 and feature 2 provide pure split. So on what basis would we decide which feature to choose for splitting.

**NSS**#6

OK,then the question was not clear in the first part. Now since it is clear, Lets answer your question.

If you see, then it is somewhat intuitive that the homogenity created by feature 2 is more than feature 1. So feature 2 should provide the best split but lets evaluate it mathematically.

**Entropy before splitting**

Yes: 4 No: 7

`Entropy_before= -[4/11*log (4/11)+7/11*log (7/11)]=0.933`

**Entropy after splitting**

*On feature 1*

*Left Node*

**Yes: 3 No: 0**

`Entropy_Left= -[3/3*log(3/3)+ 0/3*log(0/3)]= 0`

**Right Node**

**Yes: 1 No: 7**

```
Entropy_Right= -[1/8*log(1/8)+7/8log(7/8)]=0.538
Total_Entropy= left node/(left node+right node) * Entropy_Left +Right node/(Left node+Right node)*Entropy_Right
=3/11*0+8/11*0.538=0.391
```

*On feature 2*

**Left Node**

**Yes: 4 No: 1**

`Entropy_Left=-[4/5*log(4/5)+1/5log(1/5)]=0.725`

**Right Node**

**Yes: 0 No: 6**

`Entropy_Right=-[0/6*log(0/6)+6/6*log(6/6)]=0`

`Total Entropy= 5/11*0.725+6/11*0=0.329`

```
Information gain_feature1= 0.933-0.391= 0.542
Information gain_feature2= 0.933-0.329= 0.604
```

So it is evident that **Feature 2** provides us with a better split compared to **Feature1**.

I hope, I made things clear.

Regards,

Neeraj

**crisis1.08**#7

Thank you very much. Thatâ€™s exactly what I was trying to ask in the first place when I asked about taking 0/6*log(0/6) to be equal to zero