Negative Predicted Sales (Linear Regression)


#1

Hi Team,
While working on the BIG Mart Sales problem , I am facing issues when predicting the sales using linear regression, Adjusted R square : 56.21, Residual standard error : 1129, the model statistic is as per below (only significant variables).
Estimate
Intercept : - 1869
Item_MRP : 15.6
Item_Fat_content (Regular) : 52.7
Outlet_type(Supermarket 1): 1955
Outlet_type(Supermarket 2): 1626
Outlet_type(Supermarket 3): 3355
Visibility Category (Low) : 54.3

Since the intercept is negative some of the predicted sales value is coming as -ve, as a result of which the MAPE is coming as 1.
Can anyone of you please suggest how should I proceed in this situation.
Regards
Arnab


#2

From my understanding, it seems that the model is not good.
I’m saying this because if an intercept of -1869 leads to a negative predicted sales figure, that mean the sales amount is not high enough to justify a Residual Standard Error of 1129! The error is huge despite the model having an adj R sq. of 56.21. It might also be the case that you have a relatively over-fitted model at your hand.

This is my 2 cents.


#3

@Nishant_S
Thanks for your reply , I have also observed hetroscedusticity while potting Fitted vs Residual, any suggestion to improve the model.
Regards
Arnab


#4

Some amount of heteroscedasticity in sales data is normal as it is a time series data. If you are working with years of historic data, I hope you have already stationarized the time series. If not, you need to do this before applying regression.

Additionally, you can consider regularization on your model if you are concerned about over-fitting.


#5

@Nishant_S…I now applied linear regression on the logrithmic value of the sales…which increases the adjusted r2 to 74%…and also the diagnostic charts are ok…should I calculate the residual against (predicted log sales - actual log sales) or (predicted sales(= calculated from log value) - actual sales …


#6

Either of those should be fine. The results would be identical.


#7

@Nishant_S
The results are not identical…the mape is coming as 0.07 (resid on log sales) and 0.52 (resid on actual sales) …is the difference acceptable?? Any suggestions?


#8

Just to check. You are aware that you cannot take log transformation of 0 and negative values right?
I mean to ask have you accounted for this?


#9

@Nishant_S, I don’t have any negative or zero predicted values.Any suggestion on if mean absolute % Error (mean (Residual/actual sales)) as 0.52 is acceptable? if not how to reduce it?