Friedman test for looking at stability over time of a highly variable measure

Soizic Le Courtois

Join Date: Nov 2020

Posts: 5
#1

Friedman test for looking at stability over time of a highly variable measure

24 Nov 2020, 11:57

Hi,
I have a dataset about a measure of children's engagement in classrooms, which varies a lot throughout the day, but I'm trying to determine whether their engagement has some stability over time, in particular at the level of the class, when taken as a group of observations. In other words, if I go into a classroom one day, do I get a totally different result compared to another day, or do I get a similar mean and distribution?

The structure of my dataset is as follows:
I have 1669 observations for 65 children from 13 classrooms. Observations were collected over three different days in each classroom (so about 40 observations for each class per day). I observed 5 children in each classroom.
Engagement is measured on a scale of 1 to 5.

Example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 School str9 Class str11 Child byte(Day Sweep LIS) "LIS01" "LIS01RE01" "LIS01RE0101" 1 1 2 "LIS01" "LIS01RE01" "LIS01RE0101" 1 2 2 "LIS01" "LIS01RE01" "LIS01RE0101" 1 3 5 "LIS01" "LIS01RE01" "LIS01RE0101" 1 4 3 "LIS01" "LIS01RE01" "LIS01RE0101" 1 5 2 "LIS01" "LIS01RE01" "LIS01RE0101" 1 6 1 "LIS01" "LIS01RE01" "LIS01RE0101" 1 7 2 "LIS01" "LIS01RE01" "LIS01RE0101" 1 8 1 "LIS01" "LIS01RE01" "LIS01RE0101" 2 9 1 "LIS01" "LIS01RE01" "LIS01RE0101" 2 10 4 "LIS01" "LIS01RE01" "LIS01RE0101" 2 11 1 "LIS01" "LIS01RE01" "LIS01RE0101" 2 12 5 end

My understanding is that the usual test-retest and ICC aren't appropriate because there is hardly any relationship from one observation to the next, and it can't handle unbalanced data, which is the case of my dataset unless I use means (at the child or class level) which seems to defeat the point as it erases all the variability.

I was wondering if I could use Friedman's test instead to compare the mean and distribution from one day to another, but I don't know:
1. if that is an appropriate test for what I'm trying to find out
2. whether it makes sense to do the test on my entire dataset together, considering the 'day' I observed children is arbitrary for each class, i.e. there is no correspondence in terms of what day 1 means in one class compared to another class. I have tried doing the test for each class separately and adjusting the significance threshold using the Holm-Bonferroni method, but again, I'm not sure whether this is appropriate.
3. whether I've been doing it correctly in stata - I have been using the emh command because I read in a few places that this was the easiest way to do the Friedman test in stata:

Code for the whole datset:

Code:

emh LIS Day, strata(Class) anova transformation(rank) Extended Mantel-Haenszel (Cochran-Mantel-Haenszel) Stratified Test of Association ANOVA (Row Mean Scores) Statistic: Q (2) = 2.0505, P = 0.3587 Transformation: Ranks

Example code for one class (with the test repeated for each class):

Code:

emh LIS Day if Class == "LIS01RE01", strata(Child) anova transformation(rank) Extended Mantel-Haenszel (Cochran-Mantel-Haenszel) Stratified Test of Association ANOVA (Row Mean Scores) Statistic: Q (2) = 1.8083, P = 0.4049 Transformation: Ranks

4. how I interpret the Q statistic it displays.

Any help much appreciated.

Thanks!
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

24 Nov 2020, 18:14

Originally posted by Soizic Le Courtois View Post

. . . I'm trying to determine whether their engagement has some stability over time, in particular at the level of the class . . In other words, if I go into a classroom one day, do I get a totally different result compared to another day, or do I get a similar mean and distribution?

I think that Friedman's test won't give you that. I'd sooner plot the within-day mean and variance (maybe higher moments, too) for each pupil over days.

With 65 pupils, you'd probably want to stratify the line plots into, say, deciles of mean score over days in order to limit the spaghettiness of the individual graphs. And of course you'd see a difference in variability between pupils with floor- and ceiling-effect scores from those whose mean scores lie toward the center.

You might be able to fit a cross-classified random effects ordered probit or ordered logistic regression model and take a look at the relative magnitude of the variance components (within-classroom versus within-day), but that's a lot of work and it's not obvious to me what to make of any particular pair of values that you might get.
Comment
Soizic Le Courtois

Join Date: Nov 2020

Posts: 5
#3

25 Nov 2020, 02:58

Thanks I really appreciate it! Is the idea then that I would eyeball it rather than do a test?

Originally posted by Joseph Coveney View Post

With 65 pupils, you'd probably want to stratify the line plots into, say, deciles of mean score over days in order to limit the spaghettiness of the individual graphs.

Just wanting to make sure I understand correctly - are you saying plot them as line plots all on one graph, or plot 65 different graphs?

Thanks again!
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#4

25 Nov 2020, 06:03

Originally posted by Soizic Le Courtois View Post

Is the idea then that I would eyeball it rather than do a test?

Yes, at least initially.

Just wanting to make sure I understand correctly - are you saying plot them as line plots all on one graph, or plot 65 different graphs?

In between those two extremes. Maybe a half-dozen or so line plots on each graph of a rectangularly laid out array of 10 or so graphs—something along those lines.

I said deciles, but maybe instead of a set of quantiles to group the pupils into graphs (with classrooms color-coded), make one graph for each classroom? Then you can fairly easily scan within each graph to get as sense of how stable the distribution of scores is across days for pupils within each classroom, and also scan across and down the array of graphs (i.e., classrooms) to get an idea of whether the day-to-day variation is consistently greater for pupils in some classrooms than in others.

My take is that that would be easier and more immediately accessible than trying to fit some complicated regression model whose parameter estimates might present their own problems for interpretation, or some test, like Friedman's, where you get some value, some Q statistic or other, that you leaves you wondering what to make of it.
Comment

Soizic Le Courtois

Join Date: Nov 2020
Posts: 5

25 Nov 2020, 10:09

Thanks! I've done this so far with mean and standard deviation for each class - is this the sort of thing you were suggesting, but separating it out further for each student?

Click image for larger version

Name: variance mean plots by class.png
Views: 1
Size: 67.3 KB
ID: 1583355

LIS is my engagement measure.

I've tried doing the cross-classified random effects model as well as you suggested, though as you say it also adds complications. As I understand it, the observation is nested within child within class within school, and I want to add that the observation is also, separately, nested within 'Day'.

I still have a few questions about that if that's ok:

1. I'm not sure about how to code this in Stata:

Is it this:

Code:

 mixed LIS || School: || Class: || Child: || _all: R.Day , mle var

Or this?

Code:

 mixed LIS || _all: R.School || _all: R.Class || _all: R.Child || _all: R.Day , mle var

In essence, I'm not entirely sure what the _all means versus not having it.

2. Would I code days as 1, 2 and 3, or would that change depending on the class (i.e. days 1, 2, 3 for class 1; days 4, 5 6 for class2, etc.) since they were, technically, different days? I'm guessing the latter, but it leaves me a little confused over the nesting as each class then is associated with distinct days.

3. I'm not entirely clear on what this tells me - as you pointed out in your previous post.

My output is this:

Code:

mixed LIS || School: || Class: || Child: || _all: R.DaybyC , mle var

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -2717.7714  (not concave)
Iteration 1:   log likelihood = -2562.7008  (not concave)
Iteration 2:   log likelihood =  -2548.461  (not concave)
Iteration 3:   log likelihood = -2542.0193  
Iteration 4:   log likelihood = -2533.1953  (not concave)
Iteration 5:   log likelihood = -2532.9899  
Iteration 6:   log likelihood = -2532.8861  
Iteration 7:   log likelihood = -2532.8836  
Iteration 8:   log likelihood =  -2532.883  
Iteration 9:   log likelihood = -2532.8828  
Iteration 10:  log likelihood = -2532.8828  
Iteration 11:  log likelihood = -2532.8828  

Computing standard errors:
standard-error calculation has failed

Mixed-effects ML regression                     Number of obs     =      1,669

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         School |          8        120      208.6        269
          Class |         13        112      128.4        142
          Child |         65         17       25.7         30
           _all |         65         17       25.7         30
-------------------------------------------------------------

                                                Wald chi2(0)      =          .
Log likelihood = -2532.8828                     Prob > chi2       =          .

------------------------------------------------------------------------------
         LIS |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   2.980561   .0660718    45.11   0.000     2.851062    3.110059
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
School: Identity             |
                  var(_cons) |   .0095568          .             .           .
-----------------------------+------------------------------------------------
Class: Identity              |
                  var(_cons) |   .0162865          .             .           .
-----------------------------+------------------------------------------------
Child: Identity              |
                  var(_cons) |   .0735878          .             .           .
-----------------------------+------------------------------------------------
_all: Identity               |
               var(R.DaybyC) |   9.94e-15          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   1.166828          .             .           .
------------------------------------------------------------------------------
LR test vs. linear model: chi2(4) = 65.41                 Prob > chi2 = 0.0000

My interpretation is that overall the day accounts for virtually none of the variation but I don't think it tells me whether any specific day made a difference to engagement for any given classroom (compared to other days in that classroom). Is this right?

Apologies if I've misunderstood anything.

P.S. I know I'm not doing the ordered logit model - I find that impossible to interpret and when I have done it with this dataset before for other purposes, there was little difference with the multilevel linear model, but I appreciate I'm using ordinal data on a linear model.

Announcement

Friedman test for looking at stability over time of a highly variable measure

Comment

Comment

Comment

Comment