Hi Experts

Applyg Log to Linear egression how does this improve the R squared? Please elaborate what happens

lm(log(Item_Outlet_Sales) ~ ., data = new_train)

Regards,

Tony

Hi Experts

Applyg Log to Linear egression how does this improve the R squared? Please elaborate what happens

lm(log(Item_Outlet_Sales) ~ ., data = new_train)

Regards,

Tony

Hi Tony,

This is my take. When the log of the response variable has a linear relationship with the input variables, then using log transformation helps and gives a better result. I think that we canâ€™t generalize that this will give better R-square in all the cases.

Also I have seen this helps when the evaluation metric is Root Mean Square Logarithmic Error.

Thanks,

SRK

Thanks for the responce

Like to Know

with out applying log what happens?

applying log to the model what happens to the response variable and how does it impact in improving the rsquard?

Regards,

tony

As far as I know, it is not always that using a log will improve the r-square value. It depends on the dataset at hand.

Could you please share the model results with and without using the log function? Thank you.

Thanks,

SRK

hello @tillutony,

You should not just transform the variables without any indication that the relationship is not linear just for the sake of improving accuracy.Because once you transform the variable the interpretation changes.Though you just have to reconvert the response back for interpretation,but as you can understand the process becomes a little complex.For example if you are predicting Salary and you log transform this variable,you cannot really just interpret log of salary but need to take anti log.

There is something called the Tukeys 4 quadrant approach which can be used:

As for your question,with transformations you try to make the relationship more linear than it currently is and hence the R squared increases.

Hope this helps!!

Linear regression is a parametric test or analysis. It follows some assumptions one of them being that the dependent variable and independent variables are normally distributed over the data.

Log, square root and reciprocal transformations are used to transform the non normal data i.e the dependent variable here to normal.

Keep in mind that just randomly applying theses transformations on a variables without checking the plot of the variable can reduce your R-square value too.

There have been long discussions by leading statisticians that whether doing transformation is actually helpful. e.g when using a log transformation and comparing means we are comparing geometric means instead of arithmetic means (our original construct). So it is the researcherâ€™s call depending on the kind of data you are dealing with.

hope this helps.