How to calculate inter-model correlation for an ensemble in R?

r
ensemble

#1

In this blog, ensembling technique was explained nicely. But from this blog I came to know two important things:
individual model accuracy and inter-model prediction correlation.

But inter-model prediction correlation is not explained. I want to know how we can calculate this in R


#2

Hi @Saikat_Ghosh we can use caretEnsemble package to arrive at inter-model prediction correlation. Try the toy code.

library(caret)
library(caretEnsemble)

mycontrol = trainControl(method="cv", number=10, savePredictions=TRUE, classProbs=TRUE)

model_list = c('rf', 'knn') # list of algorithms

mydata = iris

set.seed(121)
models = caretList(Species~., data=mydata, trControl=mycontrol, methodList=model_list)

results = resamples(models)

modelCor(results)

Let me know if you face any issue.


#3

Hey - could you please explain what this code does; so that we (the people who are aloof of R) could understand?

TIA


#4

Sure @jalFaizy. Let me break it down for you.

  1. In this part of the code, I am loading the packages to be used — caret and caretEnsemble. After that I am using the trainControl() function to specify the type of cross-validation we wish to use.
library(caret) # for training models
library(caretEnsemble) # for creating ensemble of models

mycontrol = trainControl(method="cv", number=10, savePredictions=TRUE, classProbs=TRUE)
  1. Then I have specified the algorithms that I will be using to train models on the iris dataset. ‘rf’ is for RandomForest and ‘knn’ for k-nearest neighbors.
model_list = c('rf', 'knn') # list of algorithms
mydata = iris
  1. Here I have used caretList() function from the caretEnsemble package to fit both RandomForest and KNN models on our data.
set.seed(121)
models = caretList(Species~., data=mydata, trControl=mycontrol, methodList=model_list)
  1. Finally after creating multiple models, we will want to compare them. In this case we have got 2 models, RandomForest and KNN. To compare the models we will use the resample() function and then use its output in the modelCor() function to find the inter-model correlation between RandomForest and KNN.
results = resamples(models)
modelCor(results)

Hope it helps.