better analysis for repeated measures (delta vs random effect)

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

better analysis for repeated measures (delta vs random effect)

14 Oct 2024, 00:20

Dear all,
I thank you in advance.
I have an observational question about a pre-post study.
We measured in two timepoints the size of a certain number of brain areas called a1, a2, a3, a4, etc. (OUTCOME)
The time at baseline has a value of 0, the time at the second timepoint (followup) is different for each patient.
What interests me in particular is the effect of time: that is, is the area at followup reduced compared to time 0?

My question is: would you do an analysis assuming id as a random effect, i.e.

Code:

mixed OUTCOMEVAR Age Gender month Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment icv || id:

Or would you calculate the delta and do a simple linear regression?

Code:

regress deltaOUTCOME month baselineoutcome

Which of the two options do you think describes the effect of time adjusting for baseline?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float month byte(Age Gender) int Onset_months byte(Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment) double(a2 a3 a4 a5 a6 a7 a8 a9)
1  0 30 0 336 0 1 0 2.406 2.544 1.987 2.917 2.667 2.301 2.678 2.173
1 44 33 0 336 0 1 0 2.408 2.521  1.95 2.677  2.59 2.241 2.568 2.194
2  0 36 0 252 0 1 1 1.969 2.105 1.668 1.083 1.683 2.124 1.947 1.998
2 72 42 0 252 0 1 1 2.708 2.163  2.06 1.344 1.872 2.248  2.24 2.106
3  0 34 0 216 0 1 1 2.637 2.274 1.796  .944   1.3 2.115 2.273 1.829
3 38 37 0 216 0 1 1 2.516 2.092 2.457  .982 2.094 2.216 2.234 2.144
4  0 44 0 504 0 0 0 1.936 2.137 1.374   .83 1.563 1.764 2.157   1.6
4 36 47 0 504 0 0 0 1.805 1.993 2.314 1.433 1.628 1.432 1.961 2.052
5  0 49 0 408 0 1 1 1.783 2.322 1.403  .945  1.45 2.037 1.952 1.625
5 18 51 0 408 0 1 1  1.99 1.985 2.149  .942 1.599 2.005 2.107 1.973
6  0 25 0 240 0 1 0 3.064 2.445 1.973 3.265  2.84 2.258 2.913 2.362
6 66 31 0 240 0 1 0 2.731 1.656 2.286 1.459  1.88 1.787 2.025 2.241
end

Last edited by Gianfranco Di Gennaro; 14 Oct 2024, 00:46.

Tags: baseline adjustment, repeated measures

Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#2

14 Oct 2024, 06:13

The question of whether to fit an ANCOVA versus an ANOVA of change scores versus a repeated-measures ANOVA (or mixed model) has come up on the list a few times. I think Clyde Schechter has responded to this kind of query fairly recently on the list and you might want to search for those threads.

You've got an additional wrinkle in that you have a multivariate response—do you plan to fit a separate model to each brain region?

Regardless, if you decide to go with your second option, I recommend against fitting a regression model of the form

Code:

regress delta ... c.baseline
Comment
Gianfranco Di Gennaro

Join Date: Nov 2020

Posts: 134
#3

14 Oct 2024, 07:10

Dear Joseph Coveney ,
Thanks for your reply.
I have also faced the problem of how to handle the problem (Lord's paradox and so on), but always and only in the context of randomized trials.
We are in an observational setting here and I would not want to be missing something.
I will try to look at the material written by Clyde.

Why would you avoid using delta as a dependent variable? for a question of statistical power?
Thanks as always.
Gianfranco
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#4

14 Oct 2024, 08:03

Originally posted by Gianfranco Di Gennaro View Post

Why would you avoid using delta as a dependent variable? for a question of statistical power?

No, it's a matter of cleaner interpretation: with the baseline on both sides of the equation, you'll get a negative slope even if there's no relationship between baseline and follow-up. If you're going the ANCOVA route, then I'd recommend using the follow-up volume value as the response variable and not the change score.
Comment

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

14 Oct 2024, 11:25

Thanks again Joseph Coveney . I've got one last question (I'm sorry I'm bothering you)

I did the analysis in two ways. I give you the example for one of the outcomes called "a18".
The coefficent of interest is "Months_between_MRI"

The first analysis by setting xtset id and using xtreg. I get as a result:

Code:

. xtreg a18  Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment

Random-effects GLS regression                   Number of obs     =        150
Group variable: id                              Number of groups  =         76

R-squared:                                      Obs per group:
     Within  = 0.4138                                         min =          1
     Between = 0.1139                                         avg =        2.0
     Overall = 0.2590                                         max =          2

                                                Wald chi2(6)      =      49.99
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

-------------------------------------------------------------------------------------------------
                            a18 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                            Age |  -.0007703   .0027236    -0.28   0.777    -.0061085    .0045679
                         Gender |   .0787674   .0843433     0.93   0.350    -.0865424    .2440772
             Months_between_MRI |  -.0080851   .0014173    -5.70   0.000     -.010863   -.0053072
                      Diagnosis |     .09306   .1139897     0.82   0.414    -.1303556    .3164757
Bilateral_tonic_clonic_seizures |  -.1138652   .0841719    -1.35   0.176    -.2788392    .0511087
          Response_to_treatment |  -.2532708   .0851716    -2.97   0.003     -.420204   -.0863377
                          _cons |   2.759624   .1371241    20.13   0.000     2.490866    3.028382
--------------------------------+----------------------------------------------------------------
                        sigma_u |          0
                        sigma_e |  .47044115
                            rho |          0   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------------

Code:

The second one is using only the time to followup and using the outcome to followup and the baseline as a covariate ("baselinea18"). I get as a result:

Code:

. regress a18 Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment baselinea18 if time==2

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(7, 66)        =      1.70
       Model |  2.21838843         7  .316912632   Prob > F        =    0.1238
    Residual |   12.289674        66  .186207182   R-squared       =    0.1529
-------------+----------------------------------   Adj R-squared   =    0.0631
       Total |  14.5080624        73  .198740581   Root MSE        =    .43152

-------------------------------------------------------------------------------------------------
                            a18 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                            Age |   .0025199   .0034668     0.73   0.470    -.0044018    .0094415
                         Gender |   .0112612   .1064215     0.11   0.916    -.2012162    .2237385
             Months_between_MRI |   .0009198   .0018946     0.49   0.629    -.0028628    .0047025
                      Diagnosis |  -.0727264   .1474503    -0.49   0.623    -.3671205    .2216676
Bilateral_tonic_clonic_seizures |   -.098934   .1056947    -0.94   0.353    -.3099603    .1120923
          Response_to_treatment |  -.2484847   .1111558    -2.24   0.029    -.4704144    -.026555
                    baselinea18 |   .0980757   .1190883     0.82   0.413    -.1396918    .3358432
                          _cons |   1.861148   .3915305     4.75   0.000     1.079432    2.642864
-------------------------------------------------------------------------------------------------

Code:

As you can see, the coefficients of "Months_between_MRI" differ between the two models. I'm having trouble understanding the correct interpretation of the two models and which one I should choose. Could you please give me some advice? I thank you very much. Gianfranco

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4409
#6

14 Oct 2024, 12:43

the problem is that in your first model you do not include baselinea18 as a predictor - this makes a big difference for all other covariates and also makes the two model non-comparable
1 like
Comment

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

14 Oct 2024, 13:20

Thank you very much Rich Goldstein
Here's the model

Code:

. xtreg a18  Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment baselinea18

Random-effects GLS regression                   Number of obs     =        150
Group variable: id                              Number of groups  =         76

R-squared:                                      Obs per group:
     Within  = 0.4154                                         min =          1
     Between = 0.5569                                         avg =        2.0
     Overall = 0.4493                                         max =          2

                                                Wald chi2(7)      =     115.86
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

-------------------------------------------------------------------------------------------------
                            a18 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                            Age |  -.0003501    .002357    -0.15   0.882    -.0049698    .0042696
                         Gender |   .0084377   .0736542     0.11   0.909    -.1359219    .1527973
             Months_between_MRI |  -.0085497   .0012279    -6.96   0.000    -.0109565    -.006143
                      Diagnosis |  -.0480849   .1006518    -0.48   0.633    -.2453587     .149189
Bilateral_tonic_clonic_seizures |  -.0495956   .0733942    -0.68   0.499    -.1934455    .0942544
          Response_to_treatment |  -.1092835   .0764967    -1.43   0.153    -.2592143    .0406473
                    baselinea18 |   .5724818   .0817266     7.00   0.000     .4123007    .7326629
                          _cons |     1.1258   .2616763     4.30   0.000     .6129242    1.638676
--------------------------------+----------------------------------------------------------------
                        sigma_u |          0
                        sigma_e |  .47044115
                            rho |          0   (fraction of variance due to u_i)
-------------------------------------------------------------------------------------------------

However, as Joseph Coveney suggested, don't I put the baseline on both sides of the equation this way? In the first model don't I already adjust for baseline, since the baseline appears as an outcome?

My question remains, however, what's the difference between the three models? How do I interpret the Months_between_MRI coefficients in all three models in clinical practice?

Thanks again Joseph Coveney and Rich Goldstein

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 398
#8

14 Oct 2024, 14:21

Gianfranco Di Gennaro The baseline variable doesn't exist in the dataex you provided in post #1.

Are you reshaping the data from long to wide when you run regress and vice versa?
Comment

Gianfranco Di Gennaro

Join Date: Nov 2020
Posts: 134

14 Oct 2024, 15:22

Sorry for the length of the post but I hope it clarifies my question. Thank you in advance for your precious time. I appreciate your attention.

Thanks Erik Ruzek . No,I have artificially created a variable "baselinea19" where I insert the value of the outcome "a19"at time0 in both rows of month=0 and followup. My doubt is whether it makes sense to insert the baseline in this way since the value of baseline is already considered as outcome when I use xtreg.

I'll make it even simpler, with a clearer example.
I have this dataset where there is a repeated measures outcome at two timepoints (0, 1), the patient's sex, the patient's id, the day (day) on which I took the outcome measurement, the outcome value. The first measurement is always taken at day=0 (it's the baseline visit).
I would like to estimate how the outcome changes as one day increases within the same patient, correcting for age and sex.
Here's the dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(patientid timepoint) float outcome byte(sex days age)
1 0 2.5 0  0 24
1 1 2.7 0  4 24
2 0 1.7 1  0 45
2 1 1.9 1  3 45
3 0  .1 1  0 16
3 1  .9 1  3 16
4 0  .4 0  0 41
4 1  .6 0 10 41
5 0  .9 1  0 22
5 1 1.5 1  2 22
6 0 2.1 1  0 36
6 1 3.4 1  4 36
end

If I used

Code:

xtset patientid
xtreg outcome day sex age

with this command do I already correct for the baseline, that is for the outcome value at day=0? Or do I have to create an additional baseline variable in which to put the outcome value at day=0, and do

Code:

xtreg outcome day sex age baseline

And what difference would there be if I did a reshape

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte patientid float outcome0 byte days0 float outcome1 byte(days1 sex age)
1 2.5 0 2.7  4 0 24
2 1.7 0 1.9  3 1 45
3  .1 0  .9  3 1 16
4  .4 0  .6 10 0 41
5  .9 0 1.5  2 1 22
6 2.1 0 3.4  4 1 36
end

and use

Code:

regress outcome1 days1 sex age outcome0

where outcome0 is the baseline and days1 is the number of days at followup?

What's the difference in the interpretation of the two models?

The results of the two models are

Code:

. xtreg outcome day sex age

Random-effects GLS regression                   Number of obs     =         12
Group variable: patientid                       Number of groups  =          6

R-squared:                                      Obs per group:
     Within  = 0.3439                                         min =          2
     Between = 0.0018                                         avg =        2.0
     Overall = 0.0148                                         max =          2

                                                Wald chi2(3)      =       1.99
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.5742

------------------------------------------------------------------------------
     outcome | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        days |    .072706    .053197     1.37   0.172    -.0315581    .1769702
         sex |   .1920139   1.027715     0.19   0.852    -1.822271    2.206298
         age |   .0124007   .0455258     0.27   0.785    -.0768282    .1016295
       _cons |   .8925067   1.696921     0.53   0.599    -2.433398    4.218412
-------------+----------------------------------------------------------------
     sigma_u |  1.0306988
     sigma_e |  .42936078
         rho |  .85212798   (fraction of variance due to u_i)

and

Code:

. regress outcome1 days1 sex age outcome0

      Source |       SS           df       MS      Number of obs   =         6
-------------+----------------------------------   F(4, 1)         =  77272.52
       Model |  5.71331521         4   1.4283288   Prob > F        =    0.0027
    Residual |  .000018484         1  .000018484   R-squared       =    1.0000
-------------+----------------------------------   Adj R-squared   =    1.0000
       Total |   5.7133337         5  1.14266674   Root MSE        =     .0043

------------------------------------------------------------------------------
    outcome1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       days1 |   .3666696   .0020753   176.68   0.004     .3403005    .3930388
         sex |   1.994462   .0091586   217.77   0.003     1.878091    2.110833
         age |  -.0543073   .0003129  -173.55   0.004    -.0582833   -.0503313
    outcome0 |   1.608513   .0036902   435.89   0.001     1.561625    1.655402
       _cons |  -1.484045   .0122749  -120.90   0.005    -1.640012   -1.328078
------------------------------------------------------------------------------

If instead, as I said above, I artificially created the "baseline" variable equal to the outcome value at day=0, the dataset would be like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(patientid timepoint) float outcome byte(sex days age) float baseline
1 0 2.5 0  0 24 2.5
1 1 2.7 0  4 24 2.5
2 0 1.7 1  0 45 1.7
2 1 1.9 1  3 45 1.7
3 0  .1 1  0 16  .1
3 1  .9 1  3 16  .1
4 0  .4 0  0 41  .4
4 1  .6 0 10 41  .4
5 0  .9 1  0 22  .9
5 1 1.5 1  2 22  .9
6 0 2.1 1  0 36 2.1
6 1 3.4 1  4 36 2.1
end

and the model results would be

Code:

. xtreg outcome day sex age baseline

Random-effects GLS regression                   Number of obs     =         12
Group variable: patientid                       Number of groups  =          6

R-squared:                                      Obs per group:
     Within  = 0.3439                                         min =          2
     Between = 0.9830                                         avg =        2.0
     Overall = 0.9029                                         max =          2

                                                Wald chi2(4)      =      65.09
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     outcome | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        days |   .0870145   .0444425     1.96   0.050    -.0000911    .1741201
         sex |   .4349607   .2611439     1.67   0.096    -.0768719    .9467933
         age |  -.0095149   .0114937    -0.83   0.408    -.0320422    .0130124
    baseline |   1.098391   .1384804     7.93   0.000     .8269742    1.369807
       _cons |  -.0379828   .4335223    -0.09   0.930    -.8876709    .8117053
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  .42936078
         rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Which of the three models answers my question, how the outcome varies for a one-day increase?
And what is the difference in interpretation of the three models?

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#10

14 Oct 2024, 23:42

Originally posted by Gianfranco Di Gennaro View Post

. . . I did the analysis in two ways.. . . I'm having trouble understanding the correct interpretation of the two models and which one I should choose.

With your first model (xtreg . . .), the variance component for patients has collapsed to zero and you're basically fitting a conventional linear regression model without any random effects term for patient. I would consider that fit pathological. It's seen occasionally when using Stata's generalized least squares implementation to fit a random effects linear regression model. It seems that the patient characteristics (time-invariant covariates) account for all of the between-patient variation for this outcome variable's intercept.

Based upon that, if it were me and I had to choose between those two, then I'd opt for the second, the ANCOVA model.

On the other hand, I would not have initially gone to xtreg , re to fit the hierarchical model; rather, I would have attempted to fit either a fixed-effects model—what you're conducting is an exploratory analysis of an observational study after all—perhaps something like the following

Code:

xtreg a18 c.Months_between_MRI, i(id) fe

or else if random effects, then something like the following.

Code:

mixed a18 c.Age i.Gender c.Months_between_MRI /// i.(Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment) || id: , /// reml dfmethod(kroger) nolog

By the way, in your toy example above in #9, your four predictors basically explain all of the variance of the six observations: the ANCOVA's R² is 1.
2 likes
Comment

Announcement

better analysis for repeated measures (delta vs random effect)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment