Why do we subtract 1 from the sample size to calculate degree of freedom?

statistics

#1

Hello People,
I’m currently doing Inferential statistics course from Udacity. I’m stuck at t-distribution/degrees of freedom.
Suppose we are required to plot the t-distribution graph for a sample size ‘n’. I don’t quite understand why we fix the mean(sample mean) and draw samples to plot the t-distribution graph. Hence concluding that df(degrees of freedom) is n - 1. Instead we could draw random samples of the size n and compute the corresponding t-value and plot it in the graph without fixing the mean. Hence conclude df(degrees of freedom) as n. What is the flaw in my argument?
Regards


#2

Hi @B.Rabbit

The number of independent pieces of information that go into the estimate of a parameter are called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter are equal to the number of independent scores that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself. Therefore, the sample variance has n-1 degrees of freedom, since it is computed from n random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean.

Hope this helps.


#3

Hi @shashwat.2014
Given that the aim is to plot the t-distribution you are restricting yourself by fixing the sample mean and drawing the sample accordingly. Thus computing df as n-1. Instead keep drawing samples of size n, compute the mean and plot the t-distribution without restricting yourself to a sample mean. This way you’ll still get the same distribution which you got by fixing means but df can be computed as n. What is wrong with this logic?


#4

Let’s take an example:
suppose You are given an equation.
x+y=4
so, what are the possible values of x and y?
lets say x=2, then y automatically become 2 and if x=1 then y automatically become 3.
so,basically, you have freedom to choose only one observation i.e either x or y.If one is chosen then other will be determined automatically.
Now let’s take x+y+z=10
now if you choose x and y then z will be automatically determined.
so you have freedom to choose only 2 variables though 3 are unknown.
Now for a sample,
You know mean deviation about the mean is zero.
i.e,
1/n∑(xi -mean(x))= 0.
Now if you are told to calculate sample variance then you have draw n samples,but from the above equation, you are only able to select n-1 observation freely because last observation will be automatically determined.So,dof is n-1 ,as you are free to choose n-1 observations only!!
Thanks.