High Dimensional data with large number of rows in Regression



Hi All,

Suppose I have a dataset in which I have very large number of features approx. 200 and about 10 million data rows. How to perform regression over such data. I know that for High Dimensional data we use Lasso or Ridge Regression and for Large number of data rows we can use biglm package of R. But how to combine both. Is Ridge regression available in biglm as well?


Hi @sonam_gupta

Check the glmnet package it is one reference for high dimension and written by Pr Hastie :slight_smile:
look at this great vignette GLMNET.
And on the top you can have the last book of Pr Hastie free :slight_smile: Sparsity.

Have a good Sunday, tell me if it helps.

