Prediction interval

linear_regression

#1

Hello,
I’m reading the book “Introduction to Statistical Learning” by Trevor Hastie. In the chapter Linear regression it is mentioned that “calculation of prediction interval includes the irreducible error”. What does this mean? If my understanding is right irreducible error is the error induced by approximating a real life problem or say error induced due to unavailable data. How can such an error be included while calculating prediction interval?

Regards


#2

Hi @B.Rabbit,

That’s the difference b/w confidence interval and prediction interval.

predict(linefit4, newdata, interval="confidence", level = .90) predict(linefit4, newdata, interval="predict", level = .90)

There is a difference b/w the output you will get from the above two line of code. So what’s the difference exactly? The first one is trying to identify the confidence interval of mean of the values which you can expect by the new data. But the 2nd line is predicting the confidence interval of exact value which you can expect by the new data.

You will notice that confidence interval for the 2nd line of code is wider than the first one. Why is that? Because in linear regression the average of the error terms is equal to zero. But for the 2nd line of prediction interval the model is adding error term into it because its non zero for a single value.

Confidence interval y = mean(Beta0) + mean(Beta1) * X + mean(Error), here mean(Error) = 0
Prediction interval y = Beta0 + beta1*X + error, here error is non zero

Hope this will help in understanding the text.

Regards,
Aayush


#3

I vaguely get the idea but I still have a few doubts.

  1. “Average of error terms is equal to zero in linear regression”. Here you’re talking about the irreducible error, right?
  2. When you say “Confidence interval y = mean(Beta0) + mean(Beta1) * X + mean(Error), here mean(Error) = 0”, as far as my understanding you you compute the mean of all the coefficients and add them which would give a number not an interval.
  3. When you say"Prediction interval y = Beta0 + beta1*X + error, here error is non zero", so here beta0 is an interval? And how do you compute the error term? The error is irreducible right(as in it is the error due to unknown factors and unavailable data)?

Regards


#4

@B.Rabbit,

  1. Yes I am talking about irreducible error term.
  2. Even when we calculate mean there is still an uncertainty or standard deviation associated with it(unless its a perfect linear curve) so confidence interval account for it. (Note that its’ the standard deviation of the Y you are trying to predict)
  3. While prediction interval have to account of standard error of the error term also.

How to compute the error term - Error is just observed - Expected(predicted).
You know the Sum of squared residual error SSE right? Why do you think we take a sum of squared errors, because the sum of errors is zero. Yes error is irreducible so we have to account for it while making our prediction interval.


#5

Thanks @aayushmnit! Can you suggest me some reference of some comprehensive material on prediction interval?


#6

I don’t know of any specific resource. But any statistics book will cover this topic.