Hi,

I will split my answer into 3 parts.

- How to choose which attribute to split at a node.
- How C4.5 split categorical attribute.
- How C4.5 split continuous attribute.

Now bear with me

- How to choose which attribute to split at a node.

In order to choose a attribute to split at a node, we need a quantitative measure to tell us which attribute would lead us to the best final result (the leaf nodes are as pure as possible) in a heuristic sense.

The book mentioned 2 measures, Information Gain & Gain Ratio

Basically, we choose 1 measure to be used to build a tree and based on that measure, whichever attribute give the highest gain/gain ratio at a node will be chosen to split at that node.

- How C4.5 split categorical attribute

Suppose we have a training set with an attribute â€śweatherâ€ť which contains 3 possible values

Weather: rainy, sunny, cloudy.

At a decision node, the algorithm will consider a split one each possible values. Which means, the tree will consider splitting the data into 3 branches: weather=rainy / weather=sunny / weather=cloudy.

(Actually there is a chapter in the book saying that it will also consider grouping the values together. Which means it can be split into 2 branches: weather=(rainy or sunny) / weather=cloudy. But this is not the main point of this question)

- How C4.5 split continuous attribute.

For continuous attribute, the algorithm will always try to split it into 2 branches only.

Suppose we have a training set with an attribute â€śageâ€ť which contains following values.

Age : 10, 11, 16, 18, 20, 35

Now at a node, the algorithm will consider following possible splitting

Age <=10 & Age>10

Age <=11 & Age>11

Age <=16 & Age>16

Age <=18 & Age>18

Age <=20 & Age>20

You can see that if there are **N** possible values, we would have to consider **N-1** possible splits.

And note that we do not choose the mid-point between values as the splitting threshold. (We wonâ€™t consider Age <=10.5 as 10.5 never appears in the training set)