How to calculate the error in an ensemble technique

ensemble_methods

#1

Hello,

I am trying to learn about ensemble models and below is one piece of code I have tried(from R-blogger) -

length_divisor<-4  
iterations<-1000  
predictions<-foreach(m=1:iterations,.combine=cbind) %do% {  
  training_positions <- sample(nrow(training), size=floor((nrow(training)/length_divisor)))  
  train_pos<-1:nrow(training) %in% training_positions  
  lm_fit<-lm(y~x1+x2+x3,data=training[train_pos,])  
  predict(lm_fit,newdata=testing)  
}  
predictions<-rowMeans(predictions)  
error<-sqrt((sum((testing$y-predictions)^2))/nrow(testing))  

So here a random model is generated 1000 times and values are predicted.At the end the error is calculated.
This works fine for numerical continuous data and for techniques like linear reg,but how do I implement this in classification algos like RandomForests,KNN etc.My main pain point is the error calculation after the process so can somebody please help me on that


#2

Hi @data_hacks,

I think you need to understand the concept of error calculation. Error calculation in the above code is just taking mean of your ensemble regression model and calculating Mean square error. Which is the basic parameter to judge/ evaluate your regression model.

For classification technique, what is the evaluations metric? It’s accuracy, sensitivity, specificity etc. So what do you have to do? If you use the same model and say you are classifying as 1/0 only. Your model will give 1000 column for each line item. Now you can check frequency of 1 or 0. Say in 1000 columns 750 are 1 and 250 are 0 then you will make that row prediction as 1. After doing this you can make a confusion matrix and get all the classification model evaluation parameters like accuracy, sensitivity, specificity etc.

Hope this helps.

Regards,
Aayush