How to implement "pruning" while building CART models in R?

r
data_wrangling
data_science
decision_trees

#1

Hello,

While reading about ways to avoid and reduce overfitting on our training data while building CART models, I came across the process of pruning which simply removes the nodes which add little predictive power for the problem in hand.
How can this technique be implemented in R while building CART models? What additional parameters do we need to specify in the model?

Thanks


#2

@Ravi - this link may help you to start with.

http://www.statmethods.net/advstats/cart.html


#3

Hi Ravi
It is a bit difficult to explain the pruning process but I shall try nevertheless.
Suppose you have built a decision tree using CART and called it dt. Now you want to prune it. In the package rpart there is a function called prune() which takes two main arguments. prune(dt, cp = …). cp here stands for complexity parameter. What it essentially means is don’t split the node if the information gain does not increase atleast by an amount = cp.
Next question then is how to determine the optimal level of cp?
Do this dt$cptable. this will show you a table with various values of cp and some more columns like level, cross validation error, cross validation standard deviation. We usually chose the cp corresponding to the lowest cross validation error.
Alternately you can plot by doing plotcp(dt) and check visually.
Not sure if this helps, you may have to try out hands on a couple of times to get the gist.
Cheers


How to include the regularization term while implementing it in a model?
#4

@manizoya_1 - thanks for the explanation