Automating removal of high vif values in R

r

#1

Hi
can anyone know about how to automate removal of high VIF values in a data set. i have got many predictors values it is getting difficult to find and remove high VIF predictor manually. is there any step kind of function in VIF ???


#2

However, what this function does accomplish is something that the others do not: stepwise selection of variables using VIF. Removing individual variables with high VIF values is insufficient in the initial comparison using the full set of explanatory variables. https://www.besanttechnologies.com/training-courses/java-training-in-bangalore


#3

use below code

vif_func<-function(in_frame,thresh=3,trace=T){

require(fmsb)

#get initial vif value for all comparisons of variables
vif_init<-NULL
for(val in names(in_frame)){
form_in<-formula(paste(val," ~ ."))
vif_init<-rbind(vif_init,c(val,VIF(lm(form_in,data=in_frame))))

}
vif_max<-max(as.numeric(vif_init[,2]))

if(vif_max < thresh){
if(trace==T){ #print output of each iteration
prmatrix(vif_init,collab=c(“var”,“vif”),rowlab=rep("",nrow(vif_init)),quote=F)
cat("\n")
cat(paste(“All variables have VIF < “, thresh,”, max VIF “,round(vif_max,2), sep=””),"\n\n")
}
return(names(in_frame))
}

else{

in_dat<-in_frame

#backwards selection of explanatory variables, stops when all VIF values are below "thresh"
while(vif_max >= thresh){
  
  vif_vals<-NULL
  
  for(val in names(in_dat)){
    form_in<-formula(paste(val," ~ ."))
    vif_add<-VIF(lm(form_in,data=in_dat))
    vif_vals<-rbind(vif_vals,c(val,vif_add))
  }
  max_row<-which(vif_vals[,2] == max(as.numeric(vif_vals[,2])))[1]
  
  vif_max<-as.numeric(vif_vals[max_row,2])
  
  if(vif_max<thresh) break
  
  if(trace==T){ #print output of each iteration
    prmatrix(vif_vals,collab=c("var","vif"),rowlab=rep("",nrow(vif_vals)),quote=F)
    cat("\n")
    cat("removed: ",vif_vals[max_row,1],vif_max,"\n\n")
    flush.console()
  }
  
  in_dat<-in_dat[,!names(in_dat) %in% vif_vals[max_row,1]]
  
}

return(names(in_dat))

}

}


#4

Use caution as functionally you may want to remove the variable that is causing the other variable to show high VIF. Removing a variable from a model should not just be statistical but also figuring out which of the multicollinear variable that needs to be removed.