I have two performance scores of n items of the same type. “score1” and “score2” over a twenty year period. I have attached a sample of the data here. I would like to conduct a hypothesis test to check if “score1” drops faster than “score2” over the 20 year period. I understand that I can use a one way ANOVA of repeated measure design for the two scores and then compare the two but I am unsure of this. If this is the right method, how would I compare the two ANOVA results.
I apologize, I am unable to share the data set due to confidentiality issues.
Well did you check the Anova hypothesis? Normal Distribution and Variance? If Yes ANOVA will be ok.
Hope this help
Yes I did, but I don’t know how to compare the two anova results.
Well I shall not use anova in your case, you have a time serie. Why not you start by doing one stl (seasonal trend ) of each ? Then you use the trend components it seems that you work in Excel not R. There I can not help sorry in R you can use the stl() of the stats package, before you should transform in time serie could be tricky you will have to reformat the date but with lubridate it should be quite easy.
Hope this help (if you work in R if excel well … :))
I am sorry, something I failed to mention is that the data is normally distributed.
I think I figured it out though. I can use F test to compare the variances of the two scores. A large variance in score2 would imply that it drops slower than score 1
I could split the data into 3 sections by time. Take the last section of the data that represents later stage of the item and then do a t test to compare if the means are significantly different.
What do you think?
As mentioned if you use ANOVA you lost the time variable, you compare the observation as they were all one the same day.
Your second suggestion could be good and keep the variable time, but and there is one the size here about 7 years will define the mean which over long period could be ok. Now if you have outliers during this period you will not have a centred mean (skewed distribution) and t-test will no be good (sensitive to non normality).
In few words check the outliers, you remove them and then use your methods, it seems that you have a monotone function anyway (for what we can see).
Hope this help
Since I am only interested in which drops quicker, I can simply measure the parameters at the late stage and hence don’t mind losing the time factor.
As for the test of variance, the sample data is very large and outliers are very few, which I am going to remove so the are not a concern. I plan on using the F-test to compare the variances.
sorry I think you miss something, you mentioned you want to know which serie drops the fastest, therefore you have time embedded in youe question. The slope will defined by the delta of time and the highest negative slope is what you look for not the variance which is a distribution.
Does it make sense?
I am referring to two different strategies. In the first stratergy I will compare the means of section 3. The image should explain better. (Excuse the simplicity of the image)
In the second strategy, I will simply compare the variances (drawn from the entire data) of score1 and score2.
Your first strategy is good if you reject the null hypothesis, I take it has de facto. It could be good to add the variance on you graphic as well just for understanding.
Second strategy this will not tell you the decrease of score1 and score 2.
I finally ended up doing both. I used the t-test to show that the 4th quartile mean of score1 is significantly higher than than the score of mean2. I then used the Levene test to compare variance that score1 is significantly more spread out.
Thanks for the assistance!
(PS. What could I have done better?)