I am studying about different type regression algorithm while studying I have learnt three regression algorithm

- Ridge

2)linear

3)lasso

I want to know the comparsion between them and the situation when to use them.

I am studying about different type regression algorithm while studying I have learnt three regression algorithm

- Ridge

2)linear

3)lasso

I want to know the comparsion between them and the situation when to use them.

Hi Ankit,

This is a very nice question. I have read about this from the Elements of Statistical Learning book. This gives a clear picture of what these different shrinkage methods (Ridge, Lasso) are. Hope you will find this useful as well. PFB the link.

https://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf

Thanks,

SRK

Linear Regression

It is one of the most widely known modeling technique. Linear regression is usually among the first few topics which people pick

while learning predictive modeling. In this technique, the dependent variable is continuous, independent variable(s)

can be continuous or discrete, and nature of regression line is linear.

Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X)

using a best fit straight line (also known as regression line).

Ridge Regression

Ridge Regression is a technique used when the data suffers from multicollinearity ( independent variables are highly correlated).

In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value

far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

Above, we saw the equation for linear regression. Remember? It can be represented as:

y=a+ b*x

This equation also has an error term. The complete equation becomes:

y=a+b*x+e (error term), [error term is the value needed to correct for a prediction error between the observed and predicted value]

=> y=a+y= a+ b1x1+ b2x2+…+e, for multiple independent variables.

In a linear equation, prediction errors can be decomposed into two sub components. First is due to the biased and second is due to the variance. Prediction error can occur due to any one of these two or both components. Here, we’ll discuss about the error caused due to variance.

Ridge regression solves the multicollinearity problem through shrinkage parameter λ (lambda). Look at the equation below.

Ridge

In this equation, we have two components. First one is least square term and other one is lambda of the summation of β2 (beta- square) where β is the coefficient. This is added to least square term in order to shrink the parameter to have a very low variance.

Important Points:

•The assumptions of this regression is same as least squared regression except normality is not to be assumed

•It shrinks the value of coefficients but doesn’t reaches zero, which suggests no feature selection feature

•This is a regularization method and uses l2 regularization.

Lasso Regression

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients.

In addition, it is capable of reducing the variability and improving the accuracy of linear regression models.

Look at the equation below: LassoLasso regression differs from ridge regression in a way that it uses absolute values in the penalty function,

instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates)

values which causes some of the parameter estimates to turn out exactly zero. Larger the penalty applied,

further the estimates get shrunk towards absolute zero. This results to variable selection out of given n variables.

Important Points:

•The assumptions of this regression is same as least squared regression except normality is not to be assumed

•It shrinks coefficients to zero (exactly zero), which certainly helps in feature selection

•This is a regularization method and uses l1 regularization

•If group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero