How to calculate inter-model correlation for an ensemble in R?

r
ensemble
#1

In this blog, ensembling technique was explained nicely. But from this blog I came to know two important things:
individual model accuracy and inter-model prediction correlation.

But inter-model prediction correlation is not explained. I want to know how we can calculate this in R

0 Likes

#2

Hi @Saikat_Ghosh we can use caretEnsemble package to arrive at inter-model prediction correlation. Try the toy code.

library(caret)
library(caretEnsemble)

mycontrol = trainControl(method="cv", number=10, savePredictions=TRUE, classProbs=TRUE)

model_list = c('rf', 'knn') # list of algorithms

mydata = iris

set.seed(121)
models = caretList(Species~., data=mydata, trControl=mycontrol, methodList=model_list)

results = resamples(models)

modelCor(results)

Let me know if you face any issue.

2 Likes

#3

Hey - could you please explain what this code does; so that we (the people who are aloof of R) could understand?

TIA

1 Like

#4

Sure @jalFaizy. Let me break it down for you.

  1. In this part of the code, I am loading the packages to be used — caret and caretEnsemble. After that I am using the trainControl() function to specify the type of cross-validation we wish to use.
library(caret) # for training models
library(caretEnsemble) # for creating ensemble of models

mycontrol = trainControl(method="cv", number=10, savePredictions=TRUE, classProbs=TRUE)
  1. Then I have specified the algorithms that I will be using to train models on the iris dataset. ‘rf’ is for RandomForest and ‘knn’ for k-nearest neighbors.
model_list = c('rf', 'knn') # list of algorithms
mydata = iris
  1. Here I have used caretList() function from the caretEnsemble package to fit both RandomForest and KNN models on our data.
set.seed(121)
models = caretList(Species~., data=mydata, trControl=mycontrol, methodList=model_list)
  1. Finally after creating multiple models, we will want to compare them. In this case we have got 2 models, RandomForest and KNN. To compare the models we will use the resample() function and then use its output in the modelCor() function to find the inter-model correlation between RandomForest and KNN.
results = resamples(models)
modelCor(results)

Hope it helps.

2 Likes

#5

@pjoshi15 Thanks for this. But the modelCor function, computes the correlation between what in the models? Are they the predictions, the score or some specific metrics?

Thanks,

1 Like

#6

@BICKOCYGU modelCor computes the correlation between the predictions of the models.

1 Like

#7

Hello @pjoshi15 ! and thank you for the helpful blog
modelCor() function computes the correlation according to Pearson or Spearman correlation method?
Another question:
How can I print the prediction’s matrix for each model in R. the models that are used as inputs to the modelCor function in order to see the matrix used in correlation?
finally, I would like to read more about the working of modelCor()function. is it possible to provide me a link or any information because I read about this function in help of R but the information was so limited?
thanks again!

0 Likes

#8

Hello everyone
could you please answer my question?

0 Likes