What’s the process used by R to split the data into different buckets after an appropriate complexity parameter is selected in classification trees?
The complexity parameter (cp) is used to control the size of the decision tree and to select the optimal tree size. If the cost of adding another variable to the decision tree from the current node is above the value of cp, then tree building does not continue. We could also say that tree construction does not continue unless it would decrease the overall lack of fit by a factor of cp.
In Case you are using R:
tree <- rpart(default ~ .,data = bankloan,method="class") plot(tree);text(tree, pretty=2)
In case we need to see the optimal value of the Cp:
Hope this helps!
Thanks shuvayan. It is now clear that cp decides the number of splits in the tree, but what exactly should be the split is decided by which factor or algorithm? Eg : in your above example, where does the numbers 24.65, 9.5, etc, come from? What is the algorithm behind it?