Error while implementing machine learning with CARET package in R

r

#1

Hi all,

I am following the steps provided in the article

Unfortunately, I’ve ran into the problem that I can’t seem to resolve. I get error Error in [.data.frame(trainSet, , predictors) : undefined columns selected

when I run one of the last lines of code model_gbm<-train(trainSet[,predictors],trainSet[,outcomeName],method='gbm')

Could you please help me resolve the issue?

Please see full code below:

library("caret")
train<-read.csv("Regression Data.csv",stringsAsFactors = T)
str(train)
sum(is.na(train))
#Imputing missing values using KNN.Also centering and scaling numerical columns
preProcValues <- preProcess(train, method = c("knnImpute","center","scale"))
library('RANN')
train_processed <- predict(preProcValues, train)
sum(is.na(train_processed))
str(train_processed)
#Converting every categorical variable to numerical using dummy variables
dmy <- dummyVars(" ~ .", data = train_processed,fullRank = T)
train_transformed <- data.frame(predict(dmy, newdata = train_processed))
str(train_transformed)
#Spliting training set into two parts based on outcome: 75% and 25%
index <- createDataPartition(train_transformed$Total.Conversions, p=0.75, list=FALSE)
trainSet <- train_transformed[ index,]
testSet <- train_transformed[-index,]
str(trainSet)
#Feature selection using rfe in caret
control <- rfeControl(functions = rfFuncs,
                      method = "repeatedcv",
                      repeats = 3,
                      verbose = FALSE)
outcomeName<-'Total.Conversions'
predictors<-names(trainSet)[!names(trainSet) %in% outcomeName]
Loan_Pred_Profile <- rfe(trainSet[,predictors], trainSet[,outcomeName],
                         rfeControl = control)
Loan_Pred_Profile
#Recursive feature selection
#Outer resampling method: Cross-Validated (10 fold, repeated 3 times)
#Resampling performance over subset size
predictors<-c("Device.Type.Name", "Ad.Format.Name", "Day.of.the.week")
model_gbm<-train(trainSet[,predictors],trainSet[,outcomeName],method='gbm')

Thank you so much for your help!