Why in null hypothesis we take only single value?



I am studying about hypothesis testing in which there are two types of hypothesis one is null hypothesis and other is alternative hypothesis.

null - the parameter in which we are interested in taking a specific value.

alternative- the parameters in which we are interested.

I want to know why null hypothesis takes only one specific value, not range.


What we are essentially doing is that we are comparing the sample mean with the population mean. It is possible that the sample mean will be different than the population mean, but we would like to know if we can capture the range within which this oscillates. Let us take an example

I have the following code of 500 random observations and I generated the density plot out of it

data <- data.frame(x=runif(500,1,10))
data$y = cut(data$x,breaks=c(1,2,3,4,5,6,7,8,9,10),labels=c(1,2,3,4,5,6,7,8,9))
data$y <- as.numeric(data$y)
sample <- data[1:50,]
sample2 <- data[1:10,]
sample3 <- data[6,]

Now suppose, you are a data scientist, and you were given a particular sample of data and you want to check whether this sample can be considered part of this population data

Population Statistics
Mean : 4.95

Sample 1:
Mean : 4.4

Sample 2:
Mean : 5.2

Sample 3:
Mean : 9

For all of the above, we would like to do hypothesis testing and infer whether they are part of the population

We will calculate the z values for each of the samples

popmean <- mean(data$y)
popsd <- sd(data$y)

sample <- data[1:50,]
samplemean <- mean(as.numeric(sample$y))
# 4.4
z1 <- (samplemean - popmean)/(popsd/sqrt(50))

sample2 <- data[1:10,]
samplemean <- mean(as.numeric(sample2$y))
z2 <- (samplemean - popmean)/(popsd/sqrt(10))
# 5.2

sample3 <- data[6,]
samplemean <- mean(as.numeric(sample3$y))
z3 <- (samplemean - popmean)/(popsd/sqrt(1))

Now we will choose a significance level .05, and as per that the z value should lie in the range
siglevel <- .05
z1.alpha <- qnorm(1-siglevel)

The value is -1.644854

The values for the z values for the 3 samples are
> z1
[1] -1.483429
> z2
[1] 0.3015498
> z3
[1] 1.544806

So, we can see that all the z values are within the range and we can infer that in all PROBABILITIES they belong to the population

Now to answer your question, if we had a range in the hypothesis testing, then it would simply mean that you are considering 2 means the max and the min of your range
You can always do that, but you will get the following ranges

  1. Sample which does not lie in the confidence interval of any mean ( the max and the min of your hypothesis range )

  2. Sample which lies in the confidence interval of the high mean but not the low

  3. Sample which lies in the confidence interval of the low mean but not the high

  4. Sample which lies in both the confidence interval

  5. and 4) are self explanatory. Samples which lie in 2) and 3) might be further analyzed to get an opinion. But this can be a good application. I am thinking of a use case where this will hold relevance