What is repeated cv in caret

machine_learning
caret
validation

#1

can someone please explain how does repeated cv works and what are all these parameters marked red in caret package, I am really very confused in all these parameters

library(caret)
# load the dataset
data(iris)
# prepare training scheme
control <- trainControl(method="repeatedcv", number=10, repeats=5)
# train the model
model <- train(Species~., data=iris, method="lvq", trControl=control, tuneLength=5)
# summarize the model
print(model)

Output :

> print(model)
Learning Vector Quantization 

150 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
Resampling results across tuning parameters:

  size  k   Accuracy   Kappa
   5     1  0.9506667  0.926
   5     6  0.9466667  0.920
   5    11  0.9440000  0.916
   5    16  0.9440000  0.916
   5    21  0.9480000  0.922
   6     1  0.9493333  0.924
   6     6  0.9506667  0.926
   6    11  0.9453333  0.918
   6    16  0.9520000  0.928
   6    21  0.9573333  0.936
   7     1  0.9453333  0.918
   7     6  0.9573333  0.936
   7    11  0.9453333  0.918
   7    16  0.9440000  0.916
   7    21  0.9426667  0.914
   8     1  0.9600000  0.940
   8     6  0.9573333  0.936
   8    11  0.9493333  0.924
   8    16  0.9586667  0.938
   8    21  0.9586667  0.938
  10     1  0.9573333  0.936
  10     6  0.9533333  0.930
  10    11  0.9626667  0.944
  10    16  0.9600000  0.940
  10    21  0.9520000  0.928


#2

@vijaypalmanit In the code above, 10-fold CV mean dividing your training dataset randomly into 10 parts and then using each of 10 parts as testing dataset for the model trained on other 9. We take the average of the 10 error terms thus obtained.

In 5 repeats of 10 fold CV, we’ll perform the average of 5 error terms obtained by performing 10 fold CV five times. Important thing to note is that 5 repeats of 10 fold CV is not same as 50 fold CV.


#3

nice,i am not able to understand last line,
and also what is that parameter tuneLength.


#4

do you mean this

where each repeat is random split of data into 5 fold, where training/testing data in first iteration of Repeat1 will not be same as Repeat2?


#5

Precisely.


#6

Hi,
with tuneLength you let caret to choose 10 different values (total) amongst all the different parameters you can use for a particular model. In your case “lvq” can be parametrized with “size” and “k”.

When you do not have a clear idea of how a set of parameters influence the accuracy of a model, you use “tuneLength” like a blind search. The alternative is to use a more directed search with “tuneGrid”.

See “train” help for some examples.

Carlos Ortega