Boruta is an all relevant feature selection method. This means it tries to find all features carrying information usable for prediction, rather than finding a possibly compact subset of features on which some classifier has a minimal error.
- Whether Boruta takes care of the NA values or do we need to replace or remove NA values before feeding into Boruta ?
Boruta does not take care of the missing values, so we have to impute/remove the missing values before implementing Boruta package.
- Do we need to convert the categorical variables into numeric before feeding the data into Boruta ?
In R, Boruta works with factor variables(categorical variables), so you can try the same in python as well. If this does not work, please let me know.
- How to implement Boruta in case of regression problem ? Does the steps remain the same ?
The implementation of Boruta is similar for regression problems as well. This algorithm can be used on any classification / regression problem in hand to come up with a subset of meaningful features.