Modelling of time in repeated measures and mixed command 'machinery'

Anne Christensen

Join Date: Apr 2023
Posts: 28

Modelling of time in repeated measures and mixed command 'machinery'

24 Nov 2023, 07:33

I am testing the effect of psychological sleep treatment in an RCT using repeated measures. For some outcomes we have daily observations (0-77 days) for others we have weekly (0-10) (example data ot bottom). Due to traditions in the field and comparability of results we would like to model time as weeks.

When I run a mixed command (mixed tst_2 c.week##p1_random_group|| pilot_id: week, robust) I get different results based on whether I use the non-aggregated data or aggregated, collapsed by week.

On non-aggregated data, I get:

Click image for larger version

Name: Udklip.PNG
Views: 1
Size: 23.6 KB
ID: 1734992

On aggregated data I get:

Click image for larger version

Name: Udklip2.PNG
Views: 1
Size: 51.4 KB
ID: 1734993

I am trying to understand exactly what stata 'does' when I specify time as weeks - my supervisor said they thought this aggregated the data by weeks, but that can't be the case based on this? I have always been tought to avoid aggregating data if possible (maintain power, not throw out data), so I would like to use the non-collapsed data, but I am not ure howexcalty data is being handled "within" weeks.

Hope someone can offer some insight? Thank you in advance.

Dataexample, non-collapsed data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double mcheck_createddate float(tst_2 days week)
            .        .  .  .
1978315739066  369.375 -7  0
1978405251349  369.375 -6  0
1978498510312  369.375 -5  0
1978575898339  369.375 -4  0
1978663788416  369.375 -3  0
1978747364817  369.375 -2  0
1978840544624  369.375 -1  0
1978925071258  369.375  0  0
1979024778656      240  1  1
1979095997623      330  2  1
1979179983159      335  3  1
1979267216912      350  4  1
1979352306162      335  5  1
1979439088111      350  6  1
1979526446878      350  7  1
1979615407760      355  8  2
1979700680626      365  9  2
1979784602360      340 10  2
1979871130571      350 11  2
1979957402274      355 12  2
1980043428649      360 13  2
1980130349177      375 14  2
1980216625602      355 15  3
1980302798494      375 16  3
1980389083716      360 17  3
1980475874286      385 18  3
1980562055200      365 19  3
1980648223571      380 20  3
1980735724410      370 21  3
1980822012311      385 22  4
1980908709304      410 23  4
1980994641875      390 24  4
1981080942377      355 25  4
1981166909551      380 26  4
1981253724934      430 27  4
1981342052900      380 28  4
1981425936804      405 29  5
1981515734374      420 30  5
1981602644128      415 31  5
1981685169023      370 32  5
1981773360755      440 33  5
1981859849078      440 34  5
1981944884990      420 35  5
1982033391559      315 36  6
1982123504570      445 37  6
1982203675021      430 38  6
1982344359936      435 39  6
1982376661288      340 40  6
1982462679238      450 41  6
1982551426227      410 42  6
1982636154103      360 43  7
1982728564567      385 44  7
1982813462160      400 45  7
1982897900584      400 46  7
1982983872659      365 47  7
1983070492631      410 48  7
1983155773759      390 49  7
1983242492230      400 50  8
1983330658828      415 51  8
1983416538837      415 52  8
1983502873103      420 53  8
1983587366273      395 54  8
1983674851456      410 55  8
1983765060598      415 56  8
1983850894594      410 57  9
1983933940172      385 58  9
1984020324185      415 59  9
1984106612648      400 60  9
1984193689256      405 61  9
1984279792169      410 62  9
1984365849592      390 63  9
1984455670432      400 64 10
1984543752970      410 65 10
1984629055424      385 66 10
1984714748351      390 67 10
1984798563111      395 68 10
1984894907098      365 69 10
            .        .  .  .
            .        .  .  .
1979015931298 317.7778 -8  0
1979105233995 317.7778 -7  0
1979190440615 317.7778 -6  0
1979272777118 317.7778 -5  0
1979358986840 317.7778 -4  0
1979498188163 317.7778 -3  0
1979562555041 317.7778 -2  0
1979612420459 317.7778 -1  0
1979722662617 317.7778  0  0
1979793676529      375  1  1
1979901718677      360  2  1
1979985089800      385  3  1
1980082500718      505  4  1
1980175377866      385  5  1
1980220858384      430  6  1
1980310326760      490  7  1
1980424426590      490  8  2
1980524009890      500  9  2
1980564061026      390 10  2
1980669349324      385 11  2
end
format %tc mcheck_createddate

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

30 Nov 2023, 11:06

I am trying to understand exactly what stata 'does' when I specify time as weeks - my supervisor said they thought this aggregated the data by weeks, but that can't be the case based on this?

You are right, and your supervisor is wrong. Specifying time by weeks is not the same as aggregating the data by weeks. Aggregating discards the daily variation in the outcome.

I have always been tought to avoid aggregating data if possible (maintain power, not throw out data), so I would like to use the non-collapsed data

It is usually best to work with the disaggregated data, preserving as much variation as the data has to offer. But there are circumstances where aggregating the data would be the preferred approach. For example, if the outcome variable were measured weekly and the predictors daily, then aggregating could be a reasonable approach.

but I am not ure howexcalty data is being handled "within" weeks

In the analysis with the disaggregated data, the output does not contain any statistics that explicitly refer to the variation of the outcome within the course of weeks. Moreover, since you entered week into the model as a continuous variable, not a discrete one, you are in effect stipulating that the effect of the passage of time is a monotone step function. In other words, you are saying that the time effect makes an upward "jump" every seven days, and is flat in between. That is unlikely to be realistic unless there is some co-intervention that takes place once a week. (I will add that if time effects were small, it probably would make little to no difference whether specified in weeks or days, but the outputs you show suggest that the time effects are substantial.)

You say that the purpose of representing time by weeks is to be consistent with past studies. While I can see the virtue in making your results more comparable to other studies of the same or closely related questions, I don't see the virtue in doing something that is clearly a mis-specification of reality just because others have done it (whether out of misunderstanding or because they were unable to collect daily data so it was the best they could do) when a better alternative (modeling time in days) is readily available and can be implemented at essentially no cost. If this were my project, the main analysis would specify the time variable as daily (ranging 0-77), not weekly. For purposes of comparison to older studies I might include a secondary analysis using weekly time in an appendix to the main report. YMMV.
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#3

01 Dec 2023, 06:36

Thank you very much for these thoughts and answers Clyde, very helpful.
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#4

05 Dec 2023, 04:01

Dear Clyde
I am afraid I have a follow-up question. Regarding you comment that I specified time as a continuous variable - I in fact never considered anything else, but planned to test for non-linear functions.
However, if i do not specify c. (that is, if my command is mixed isiscore isi_week##group_recode|| pilot_id: isi_week , robust), I see that the log-pseudolikelihood is lower, as is the residual variance. Does that suggest I should model time as a discrete rather than a continuous function? And if so, which p-value would one report for the time x group interaction, now that the output does not yield one but multiple coefficients?

Thank you again.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

05 Dec 2023, 14:50

Well, you will always get a better fit with i.time than with c.time because i.time can fit any pattern of time-based shocks to the outcome, whereas c.time is constrained to finding the best fitting line. So there is no surprise that you get a lower log-pseudolikelihood. But that doesn't necessarily make the use of discrete time a better choice for the model--it is possible that you are overfitting noise in your data when you do that. Rather than relying on the log-pseudolikelihood, re-run both model, and follow each one with -estat ic-. That will give you the AIC and BIC statistics. These statistics start with the log likelihood and then apply a penalty for having extra parameters (i.time always has more parameters than c.time). The model with the lower value for AIC (or BIC if you prefer--usually the two statistics will agree as to which model is better) is preferred.

If you end up going with the discrete week variable, because your overall purpose here is to estimate treatment effects in a design that includes pre- and post-treatment data in both treatment arms, I would report each of the interaction coefficients along with a confidence interval. By the way, that's what I would do with the continuous model as well. I don't find p-values informative or at all useful for this purpose.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#6

05 Dec 2023, 15:12

When you run the model with time as categorical, do you also get a set of random slopes for each time point? Please post the output from that model so we can understand what you are looking at.
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#7

06 Dec 2023, 04:06

Thank you, that makes sense.

So when I am testing this I am currently comparing 2 models; In both cases I have a curvelinear effect of time, and an autoregressive covariance matrix, because that provides the best fit based on AIC and BIC.
However, does using curvelinear time still makes sense for the 'time-as-discrete' model?

Time as continuous:
mixed isiscore group_recode##c.isi_week##c.isi_week|| pilot_id: c.isi_week##c.isi_week, robust residuals(ar 1, t( isi_week ))

Time as discrete (am I using the right factor notation here?)
mixed isiscore group_recode##isi_week##isi_week|| pilot_id: i.isi_week##i.isi_week, robust residuals(ar 1, t( isi_week ))

This gives me the following stats:

So here AIC and BIC do seem to suggest different models? Also, the discrete model has more degrees of freedom - if i.time has more parameters I would think that model had fewer degrees of freedom but perhaps I have that the wrong way round? Edit: yes looks like I do, just a found a source that explains this, sorry.

Last edited by Anne Christensen; 06 Dec 2023, 04:16.
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#8

06 Dec 2023, 04:08

Erik Ruzek outputs below
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#9

06 Dec 2023, 07:18

A couple of quick observations. I am more concerned about serial correlation at the day level rather than the week level. You can run models without predictors just to look at different types of alternate variance structures for the residuals to determine which, if any, fit best. The degrees of freedom in this case is a count of the number of parameters you are estimating. Obviously in the continuous time model, you have a lot fewer parameters and BIC, which penalizes model complexity, prefers the continuous time model accordingly.

What is the logic of the time as discrete model? Do you have hypotheses about unique weekly effects?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

06 Dec 2023, 08:58

I agree with Erik Ruzek's comments in #9.

I want to comment on

However, does using curvelinear time still makes sense for the 'time-as-discrete' model?

Time as continuous:
mixed isiscore group_recode##c.isi_week##c.isi_week|| pilot_id: c.isi_week##c.isi_week, robust residuals(ar 1, t( isi_week ))

Time as discrete (am I using the right factor notation here?)
mixed isiscore group_recode##isi_week##isi_week|| pilot_id: i.isi_week##i.isi_week, robust residuals(ar 1, t( isi_week ))

No, it does not make sense to use quadratic (or higher order) terms with discrete variables. It is just a way of adding extra parameters to what is, in fact, the same model. When you use i.isi_week, the model allows for a separate "shock" to the outcome at each week. This is capable of fitting any arbitrary amount of non-linearity by itself. When you add the quadratic terms to this, they do nothing to improve the fit of the model and they add parameters. I am actually surprised that the AIC results do not reflect this. The BIC results do. BIC applies a larger penalty for extra parameters than AIC does, and it is giving you more reliable results here.

Anyway, you should never use higher-order terms with a discrete variable.
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#11

07 Dec 2023, 06:40

Originally posted by Erik Ruzek View Post

What is the logic of the time as discrete model? Do you have hypotheses about unique weekly effects?

Unfortunetaly we don't really have any hypotheses regarding the effect of time, other than ISI scores being reduced over time. The main research question is whether there is a group x time interaction, which is the case regardless of how time is modelled. But when I look at the raw data, they are clearly not linear, and the discrete modelling most resembles the actual data (based on marginsplot after mixed).

But given Clydes comments ("you will always get a better fit with i.time than with c.time because i.time can fit any pattern of time-based shocks to the outcome, whereas c.time is constrained to finding the best fitting line") I guess that is no surprise. I wanted to compare qnorm plots to assess model fit of the discrete vs continuous model, in addition to AIC and BIC. But there seems to be a problem with the c. and i. indicators?
But my understanding is that you recommend relying on BIC in the absence of any clear hypothesis regarding time?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#12

07 Dec 2023, 07:44

I would say that for the hypothesis you implicitly stated, "...ISI scores being reduced over time," you are talking about a trend analysis. As you note, the trend in question is quadratic. And of the two ways of representing time, i.time (categorical) vs. c.time (continuous), the c.time##group_recode (combined with the quadratic version) gives you the cleanest test of your hypothesis about scores being reduced over time. You haven't shown us the marginsplot from the c.time model, but I can imagine it being the centerpiece of your presentation of results.

The research question implicitly examined in i.time (categorical) model is the following, do the groups differ in their mean weekly values of the outcome. You are testing that for each week in the data. You can eyeball the marginsplot from that model to get a sense of the trend, but you need to do some post-estimation testing to plausibly test the hypothesis that the trends are different. The i.time way is just not as clean and suffers from being pretty complex. All else equal, statisticians generally prefer to model their data in the most parsimonious way possible that remains faithful to the presumed data generating process.

You should definitely look at whether there are violations of the model assumptions (qnorm and the like). Whether you choose BIC or AIC as an indication of model fit is really up to you. BIC is more consistent with my statement about preferring parsimony in statistical models.
1 like
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#13

08 Dec 2023, 02:57

Thank you very much for these reflections Erik, in particular I found your description of the i.time research question clarifying. I rarely see 'time was modelled as a discrete variable' in research publications, perhaps that is due to this preference for parsimony that you mention.

Once again greatly appreciate your time and thoughts on this!
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

08 Dec 2023, 09:00

I rarely see 'time was modelled as a discrete variable' in research publications, perhaps that is due to this preference for parsimony that you mention

No, actually it is due to a selection bias in the publications you read. You appear to be working in a biomedical or psychology related field, as do I, and in our line of work our longitudinal data usually has fairly long intervals between measurements. Moreover, it is in the nature of biological processes that time trends tend to be smooth functions of time, and, at least for relatively limited time spans, linear or mildly curvilinear. Consequently continuous time is usually the preferred model.

Were you to read the literature in finance, the data would often be measured at high frequency and, more important, the processes being modeled are rather "jumpy" at the observed time scales. There is nothing smooth about the way things evolve, and linear or low-order polynomial time would usually be so far off the mark that one could barely suggest using them with a straight face. They usually don't even bother mentioning that time is modeled as discrete because it goes without saying! In the rare case where smooth time trends are appropriate, they would make a point of saying that.
1 like
Comment
Anne Christensen

Join Date: Apr 2023

Posts: 28
#15

11 Dec 2023, 07:13

Good point Clyde, this will of course be heavily dependent on the research field (but yes you are right, in this case I was thinking of psychological publications).
Comment

Announcement

Modelling of time in repeated measures and mixed command 'machinery'

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment