I was wondering if in regression trees, we may use linear regression in each bucket that have been created by dividing the data into splits, instead of just averaging out the value of the dependent variable in each split. According to me, that should create a better model in terms of accuracy or will this cause overfitting?
@Siddhant - Yes if we perform the linear regression on each node it will improve the performance of our regression model because we are creating a small population of the data to each linear regression model.
But we should have large data set so we can divide the whole data into the small population for each linear regression model.
I would also suggest you to perform cross-validation to check the overfitting of the model.
So what is the name and syntax of such algorithm if it exists in R?
The method mob in the package party could be used for this.
Thanks Alain, can the same be done for classification trees and perform logistic regression in each split?
if you go through the vignette for mob i think there is one example with logistic regression. Cutting short yes you can also use other methods.
Have a good day.