Hello R Users

I’ve read lot many blog posts on variable selection, but couldn’t find a universal path to choose.

I tried digging deeper, and came across few methods in R to find most significant variable to choose for model building:

```
#using knn imputation
library(DMwR)
inputData <- knnImputation(Data)
```

and,

```
#use random forest to find set of predictors
library(party)
cf1 <- cforest(ozone_reading ~ ., data = inputData, controls = cforest_unbiased(mtry = 2, ntree = 50))
#get variable importance based on mean decrease in accuracy
varimp(cf1)
#based on mean difference in accuracy
varimp(cf1, conditional = TRUE)
varimpAUC(cf1) #more robust towards class imbalance
```

and,

```
#using relaimpo package
#check for relative importance of variable
install.packages("relaimpo")
library(relaimpo)
#fit lm model
lmMod <- lm(ozone_reading ~ ., data = inputData)
#calculate relative importance scaled upto 100
relImportance <- calc.relimp(lmMod, type = 'lmg', rela = TRUE)
#relative importance
sort(relImportance$lmg, decreasing = TRUE)
```

and finally,

```
#boruta method
install.packages("Boruta")
library(Boruta)
boruta_output <- Boruta(response ~ ., data = na.omit(inputData), doTrace=2)
```

My question is:

Is it necessary for me, to select significant variables using these methods? What other options do I have? Is there any robust method which can be applied in all situation to find the most significant variable?

Somebody help!