If outliers are present in our data, how does it affect our regression model?

Can someone please explain in the case of both linear and logistic regression model?

If outliers are present in our data, how does it affect our regression model?

Can someone please explain in the case of both linear and logistic regression model?

Outliers can have a dramatic impact on linear regression. It can change the model equation completely i.e. bad prediction or estimation. Look at the below scatter plot and linear equation with or without outlier.

Look at the both snapshots, equation parameters changing a lot.

For more detail on Outlier and ways to deal with it, you can read this article.

Regards,

Imran

1 Like

The outlier will increase most of the components of central tendencies and dispersion including mean, standard deviation, and variance. When, say, mean and standard deviation [SD] increases, the T-stat value increases and this in turn could make make the p-value larger than .5. This means, a predictor which is actually a significant predictor might turn out to be an insignificant predictor because of noise in the data leading to higher SD, implying incorrect modeling and consequently inappropriate prediction.

Another aspect is that high SD would increase Standard error [SE] of mean and consequently Margin of error [ME] and confidence level. Note: SE = SD / sqrt [n] and ME = Critical Value * SE. And Interval = Stat + ME. This means that with wider confidence interval, the estimate looses out on precision making the prediction decision unstable. This is precisely because the SE of estimate is SD of prediction error [ Optimization requires minimization of prediction error] leading to higher errors and the regression line does not fit well.

So in one line, Outlier impact prediction and prediction results including model estimates.