I am working on an assignment in R, where I have to run logistic regression for many combination(could be close to 1lakh) .
Currently , the execution time for the loop for 28k records is around 6hrs in R.
I am using “for loop” for multiple iterations.
I cannot paste the code here , but the logic is as follows:
I have 28k different combinations of independent variables. For each combination, I am running glm function , computing AUC, GINI, KS, VIF, Factor Weight and other model coefficients. After that I am appending records for each combination into data frame.
I tried using foreach and doParallel in order to reduce the execution time , but I didn’t succeed.
Is there any way to reduce the execution time of this loop , or is there any functionality like parallel execution in R , to reduce the execution time?
@Rishabh0709, in R, to execute things faster rule #1 is to eliminate any for loops. Try to vectorize your solution using an ‘apply’ kind of function.
That said, training 28k models on 100k cases should take a long time whatever you do. You could try random sampling those combinations instead of testing every one of them. Something like RandomizedSearchCV
Dimensionality reduction techniques are also possibilities, but this usually results in loss of interpretability and only you will know if it’s acceptable for your application or not.