 Difference between R-square and Adjusted R-Square?

Hi,

Whenever I perform linear regression to predict behavior of target variable then I used to get output for R-Square and Adjusted R-square. I know higher the value of R-square directly proportionate to good model and Adjusted R-square value is always close to R-square. Can someone explain what is the basic difference between theses two.

Thx,
Imran

Hi Imran,

R Square is a basic matrix which tells you about that how much variance is been explained by the model. What happens in a multivariate linear regression is that if you keep on adding new variables, the R square value will always increase irrespective of the variable significance. What adjusted R square do is calculate R square from only those variables whose addition in the model which are significant. So always while doing a multivariate linear regression we should look at adjusted R square instead of R square.

Hope this helps.

Regards,
Aayush

15 Likes

Imran,

Let us first understand what is R-squared:

R-squared or R2 explains the degree to which your input variables explain the variation of your output / predicted variable. So, if R-square is 0.8, it means 80% of the variation in the output variable is explained by the input variables. So, in simple terms, higher the R squared, the more variation is explained by your input variables and hence better is your model.

However, the problem with R-squared is that it will either stay the same or increase with addition of more variables, even if they do not have any relationship with the output variables. This is where “Adjusted R square” comes to help. Adjusted R-square penalizes you for adding variables which do not improve your existing model.

Hence, if you are building Linear regression on multiple variable, it is always suggested that you use Adjusted R-squared to judge goodness of model. In case you only have one input variable, R-square and Adjusted R squared would be exactly same.

Typically, the more non-significant variables you add into the model, the gap in R-squared and Adjusted R-squared increases.

44 Likes

R-squared measures the proportion of the variation in your dependent variable (Y) explained by your independent variables (X) for a linear regression model. Adjusted R-squared adjusts the statistic based on the number of independent variables in the model.

The reason this is important is because you can “game” R-squared by adding more and more independent variables, irrespective of how well they are correlated to your dependent variable. Obviously, this isn’t a desirable property of a goodness-of-fit statistic. Conversely, adjusted R-squared provides an adjustment to the R-squared statistic such that an independent variable that has a correlation to Y increases adjusted R-squared and any variable without a strong correlation will make adjusted R-squared decrease. That is the desired property of a goodness-of-fit statistic.

About which one to use…in the case of a linear regression with more than one variable: adjusted R-squared. For a single independent variable model, both statistics are interchangeable.

2 Likes

@kunal, @aayushmnit it possible that R Square has improved significantly yet Adjusted R Square is decreased with addition of a new predictor?

1 Like

@vajravi

Yes, it is possible - this happens in case the newly added variable brings in more complexity than power to predict the target variables.

Regards,
Kunal

2 Likes

@vajravi- yes,their can be a case where the R Square has improved significantly but Adjusted R Square is decreased with addition of a new predictor. This happen only when the newly added predictor is insignificant for the model

Hi ,

The easiest way to check the accuracy of a model is by looking at the R-squared value.
The summary provides two R-squared values, namely Multiple R-squared, and Adjusted R-squared.

The Multiple R-squared is calculated as follows:

Multiple R-squared = 1 – SSE/SST where:
SSE is the sum of square of residuals. Residual is the difference between the predicted value and the actual value, and can be accessed by predictionModel\$residuals.
SST is the total sum of squares. It is calculated by summing the squares of difference between the actual value and the mean value.

For example,
lets say that we have 5, 6, 7, and 8, and a model predicts the outcomes as 4.5, 6.3, 7.2, and 7.9. Then,
SSE can be calculated as: SSE = (5 – 4.5) ^ 2 + (6 – 6.3) ^ 2 + (7 – 7.2) ^ 2 + (8 – 7.9) ^ 2;
and
SST can be calculated as: mean = (5 + 6 + 7 + 8) / 4 = 6.5; SST = (5 – 6.5) ^ 2 + (6 – 6.5) ^ 2 + (7 – 6.5) ^ 2 + (8 – 6.5) ^ 2

The Adjusted R-squared value is similar to the Multiple R-squared value,
but it accounts for the number of variables. This means that the Multiple R-squared will always increase
when a new variable is added to the prediction model, but if the variable is a non-significant one, the Adjusted R-squared value will decrease.

An R-squared value of 1 means that it is a perfect prediction model,

R-squared or R2 explains the degree to which your input variables explain the variation of your output / predicted variable. So, if R-square is 0.8, it means 80% of the variation in the output variable is explained by the input variables. So, in simple terms, higher the R squared, the more variation is explained by your input variables and hence better is your model.

However, the problem with R-squared is that it will either stay the same or increase with addition of more variables, even if they do not have any relationship with the output variables. This is where “Adjusted R square” comes to help. Adjusted R-square penalizes you for adding variables which do not improve your existing model.

Hence, if you are building Linear regression on multiple variable, it is always suggested that you use Adjusted R-squared to judge goodness of model. In case you only have one input variable, R-square and Adjusted R squared would be exactly same.

Typically, the more non-significant variables you add into the model, the gap in R-squared and Adjusted R-squared increases.

Regards,
Tony

12 Likes

@tillutony: You clubbed everything perfectly. cheers.

1 Like

If we add more variables to the model, definitely R-sqaured will increase but Adjusted R-squared will not always increase except the added variable is significant.

@kunal @aayushmnit @tillutony
Can you pls explain what is the difference between Predicted R square and these two terms (Multiple R squared and Adjusted R squared)?

I was looking for this answer.
Thank you Kunal sir for helping us out.

1 Like

Hi, you can find a more comprehensive explanation here: https://medium.com/analytics-vidhya/measuring-the-goodness-of-fit-r%C2%B2-versus-adjusted-r%C2%B2-1e8ed0b5784a

Hello,

What could be ideal value for Adjusted R-squared?

Regards,
Ankit Prajapati

Higher the value, better the model. So the ideal value would be 1.

Thanks Tony. Really helpful

Thanks Kunal Sir. Analytics Vidhya is really helpful. Keep doing the good work.