How to calculate sample size threshold for data analysis?

data_science
statistics
insight

#1

Hi all,

I found this sample size calculator - http://www.raosoft.com/samplesize.html?nosurvey. Understood that this is useful to carry out an experiment on a smaller sample size to replicate results and take them for granted for whatever large sample size one has.

But can the same calculator be considered in deriving insight (for data analysis)? For example, among A,B,C,D,E,F… to know which one is better by considering the data of about say 20000, can this be used to calculate the minimum number of data points that A,B,C,… should have to conclude something meaningful like A is better than B or similar cases.


#2

Hi @akshay.kotha,

This sample size calculator would probably reduce the calculation time but has limited options. You can probably try both and compare which gives you a better understanding about the distribution.


#3

What do you mean by both?

I wanted to understand if this is reliable for the purpose of data analysis because it’s main purpose was to calculate sample size to conduct experiments.


#4

Hi,

Sorry for being unclear about my statement before. By both I meant how we’ve been working without using the calculator, and with using this.

It can be used for calculating sample size, but when you say “using the calculator for deriving insight (for data analysis)” , how do you suggest to go about deriving insights using it?


#5

Using the calculator for deriving insight means - I have mentioned the instance when I asked the question. Let me know if that doesn’t help to get a feel of what I am expecting to do.


#6

Assuming that A,B,C… etc are samples of the total population, having different number of data points each, and you wish to figure out which sample to use. You can use the calculator to find the sample size, and select the sample.

You might face a problem when the A,B,C,D,… have equal size.


#7

Not quite right.

Actually there is a dependent variable X and A,B,C,D are levels of an independent variable. Now the number of data points under level A, B,C,D are different. I want to calculate which one performs better i.e for which of A,B,C,D, X is good enough. So, for that we will have to compare across A,B,C,D which is apparent. But as I already mentioned, there are different number of data points for each level, what is the threshold number of data points needed in order to compare. For this purpose I was using the sample size calculator which gives a minimum number of datapoints needed to conduct an experiment. I was unsure about doing this. Hence, posted here.


#8

Hi @akshay.kotha,

This Sample Size Calculator can be used to determine the sample size one need to take in order to get results that reflect the target population as precisely as needed. You can set the confidence level and population size and it will return the sample size as an output.

This is an effective method that can be used to calculate the sample size. Instead of looking at the entire data, you only have to work on the sample data, so it reduces the time as well.


#9

Thanks Pulkit. I got your point but unfortunately that’s not my purpose. I want to compare data for which there has to be minimum number of data points needed.

Suppose there is a ratio (X/Y), to compare (X/Y) of A and (X/Y) of B there has to be a minimum number of data points so that the comparison makes sense. That is what I am trying to figure out. I found this calculator and checking with the community whether it can be used for my purpose :slight_smile: