Hello fellow data scientists,
Problem definition: I need to test whether the difference between the “mean” utilization metrics of two machines is statistically significant.
Given Data: Utilisation data is available for two machines for more than 100 days. Therefore, the mean, standard deviation, and standard error can be calculated over the 100 days.
My approach: Calculate the mean, standard error and then use t-test with unequal variances to test the difference in means.
Clarifications: What is the best test here to test the statistical significance of the difference in “means”? Is it t-test or Mann Whitney u test? I read many academic papers where t-test is widely used than Mann Whitney U test (without testing for the normal distribution). Is it because they take the central limit theorem for granted and they use the t-test? Also, some claims that Mann Whitney U test is really not used to test the statistical significance of the means rather it is used for medians.
It’s quite hard for me to understand the logic behind choosing these tests.