Decision Tree with Continuous Variables



I am new to business analytics domain and started exploring methods of modeling. Recently I studied decision tree and not clear on method of handling continuous variable by it. I know decision tree handles categorical variable very well such as gender has two values male and female then it separates node in two sub-nodes based on gender but in case of continuous variable, how does it handle such as income? I know it is not possible to create sub-nodes for each values then how it decides to create bucket like <20000, >=20000 & <50000 and >=50000.


How does decision tree classifier split using continuous varaible?


For more back end information on tree algorithms to continuous variables visit the following link -

It talks about Regression trees and it’s back end working. Hope this helps.




Decision trees work with continuous variables as well. The way they work is by principle of reduction of variance.

Let us take an example, where you have age as the target variable. So, let us say you compute the variance of age and it comes out to be x. Next, decision tree looks at various splits and calculates the total weighted variance of each of these splits. It chooses the split which provides the minimum variance.

Hope this clarifies the algorithm and the doubt.



Information Value & WOE statistics found to be very effective in reducing number of variables or better way to put it is to identify potential variables. refer the link pls

In the case of Income variable (any continuous) apply the concept called Coarse classing (make 40 groups) => Fine classing (condense to 5 to 9 groups), this methodology is widely used in Score Card building in retail banking Industry, above referred link would give you sufficient understanding of this concept - I applied this concept on metrics before building DT models & could make insights which made sense to business/clients.


Hi I am facing one problem in decision tree. The independent variable is categorical with many categories like Region, country. how will consider them in the model? Please clarify. thank you



I will split my answer into 3 parts.

  1. How to choose which attribute to split at a node.
  2. How C4.5 split categorical attribute.
  3. How C4.5 split continuous attribute.

Now bear with me :slight_smile:

  1. How to choose which attribute to split at a node.

In order to choose a attribute to split at a node, we need a quantitative measure to tell us which attribute would lead us to the best final result (the leaf nodes are as pure as possible) in a heuristic sense.
The book mentioned 2 measures, Information Gain & Gain Ratio
Basically, we choose 1 measure to be used to build a tree and based on that measure, whichever attribute give the highest gain/gain ratio at a node will be chosen to split at that node.

  1. How C4.5 split categorical attribute

Suppose we have a training set with an attribute “weather” which contains 3 possible values

Weather: rainy, sunny, cloudy.

At a decision node, the algorithm will consider a split one each possible values. Which means, the tree will consider splitting the data into 3 branches: weather=rainy / weather=sunny / weather=cloudy.

(Actually there is a chapter in the book saying that it will also consider grouping the values together. Which means it can be split into 2 branches: weather=(rainy or sunny) / weather=cloudy. But this is not the main point of this question)

  1. How C4.5 split continuous attribute.

For continuous attribute, the algorithm will always try to split it into 2 branches only.

Suppose we have a training set with an attribute “age” which contains following values.

Age : 10, 11, 16, 18, 20, 35

Now at a node, the algorithm will consider following possible splitting

Age <=10 & Age>10
Age <=11 & Age>11
Age <=16 & Age>16
Age <=18 & Age>18
Age <=20 & Age>20

You can see that if there are N possible values, we would have to consider N-1 possible splits.
And note that we do not choose the mid-point between values as the splitting threshold. (We won’t consider Age <=10.5 as 10.5 never appears in the training set)


In case of attributes with continuous values, ID3, C4.5 are not effective. FUZZY ID3 is an effective algorithm to employ. I have used it in my project to classify and predict the operating point of IEEE 30-bus system. As we know in case of power system with attributes such as voltage, current, active power, reactive power, power angle, … we have purely continuous attributes where C4.5 would fail. btw fuzzzy ID3 was effective. Below are the references:


Can you please explain this with a more elaborative example with calculations on how the variance is being used here to decide the split? I tried finding examples online, but couldn’t get any.


Hi @rupam_anup,

The third section of the following article covers the methods to decide the right split.