How to plot the difference between sample mean and population mean in R

r

#1

Hello,

I am trying to plot a graph to show that as sample size increases the difference between the sample mean and pop mean approaches 0.I have reached this far:

# Generate multiple samples and compare mean:
random <- data.frame(x = rnorm(1:100000))

sample_means <- data.frame(x = 1:1000)

for (i in 1:1000){
  sample <- random[sample(1:nrow(random), 100,
                                    replace=FALSE),]
  sample_means[i,] <- mean(sample)
}

hist(random$x)
hist(sample_means$x)

diff_means <- abs(mean(random$x))- abs(mean(sample_means$x))

Now what I want is to give different sample sizes and get the differences and store them somewhere.Then use this data to get a plot which shows that as sample size increases the difference approaches 0.
Can someone please help me with this??


#2

@data_hacks,

I tried doing what you wanted to do, but with 10000 values in random instead of 100000 as it was taking a lot of computational time. So when I modify your code like this ->

diff_means=c()
random <- data.frame(x = rnorm(1:10000))
for(j in 1:1000){
#taking a different value out of 1:10000 to get different sample sizes
sample_means <- data.frame(x = 1:sample(1:10000,1,replace=FALSE))

for (i in 1:length(sample_means$x)){
sample <- random[sample(1:nrow(random), 100,
replace=FALSE),]
sample_means[i,] <- mean(sample)
}

diff_means[j] <- abs(mean(random$x))- abs(mean(sample_means$x))
print(diff_means[j])

if(j==1)
{
plot(length(sample_means$x),diff_means[j], xlim=c(0,10000),ylim=c(-0.01,0.01))
}
else{
points(length(sample_means$x),diff_means[j])
}
}

I get this plot->

This plot clearly is showing what you wanted to show. As the sample size is increasing, the difference values are concentrating close to 0, whereas with less sample size, we have so much dispersed and highly variated difference values.
You can try doing it with 100000 values too if you have time and a high end PC! :wink:
Hope this helps!