How to decide which models to combine for ensemble methods




While trying to implement ensemble methods I have used a dataset which looks like:

I am trying to predict y from x1,x2,x3.
While linear regression and random forest gives error’s in the range of 136 SVM gives an error rate of 129.
When I tried to combine the results-1)Combined svm and rf 2)combined svm,random forest and linear reg,shown below is what happened:

As can be seen with SVM and RF the error rate was actually lower than when I combined all the three models.
Aren’t ensemble methods supposed to improve our accuracy,what is going wrong here.?
Also,since this is a small dataset,combining some or all the models and seeing the difference was not an issue,but when the data is large doing such trial and error might be time consuming and hence I want to know how do we decide on which models to combine?


Glad you asked this question. I am in process of building a process flow to select right set of learners for ensemble models. Coming to your point…what is going wrong here? You are trying to bag learners with equal weights. In such cases, we do not give optimum weight to the high predictive models. We have multiple techniques to find the right weights for these models. On way is forward selection on which my next article will be based at. Give a shot to this problem with my recommended method, possibly you will find a better solution.
Hope this helps.