Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • better analysis for repeated measures (delta vs random effect)

    Dear all,
    I thank you in advance.
    I have an observational question about a pre-post study.
    We measured in two timepoints the size of a certain number of brain areas called a1, a2, a3, a4, etc. (OUTCOME)
    The time at baseline has a value of 0, the time at the second timepoint (followup) is different for each patient.
    What interests me in particular is the effect of time: that is, is the area at followup reduced compared to time 0?

    My question is: would you do an analysis assuming id as a random effect, i.e.
    Code:
    mixed OUTCOMEVAR Age Gender month Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment icv || id:
    Or would you calculate the delta and do a simple linear regression?
    Code:
    regress deltaOUTCOME month baselineoutcome
    Which of the two options do you think describes the effect of time adjusting for baseline?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id float month byte(Age Gender) int Onset_months byte(Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment) double(a2 a3 a4 a5 a6 a7 a8 a9)
    1  0 30 0 336 0 1 0 2.406 2.544 1.987 2.917 2.667 2.301 2.678 2.173
    1 44 33 0 336 0 1 0 2.408 2.521  1.95 2.677  2.59 2.241 2.568 2.194
    2  0 36 0 252 0 1 1 1.969 2.105 1.668 1.083 1.683 2.124 1.947 1.998
    2 72 42 0 252 0 1 1 2.708 2.163  2.06 1.344 1.872 2.248  2.24 2.106
    3  0 34 0 216 0 1 1 2.637 2.274 1.796  .944   1.3 2.115 2.273 1.829
    3 38 37 0 216 0 1 1 2.516 2.092 2.457  .982 2.094 2.216 2.234 2.144
    4  0 44 0 504 0 0 0 1.936 2.137 1.374   .83 1.563 1.764 2.157   1.6
    4 36 47 0 504 0 0 0 1.805 1.993 2.314 1.433 1.628 1.432 1.961 2.052
    5  0 49 0 408 0 1 1 1.783 2.322 1.403  .945  1.45 2.037 1.952 1.625
    5 18 51 0 408 0 1 1  1.99 1.985 2.149  .942 1.599 2.005 2.107 1.973
    6  0 25 0 240 0 1 0 3.064 2.445 1.973 3.265  2.84 2.258 2.913 2.362
    6 66 31 0 240 0 1 0 2.731 1.656 2.286 1.459  1.88 1.787 2.025 2.241
    end
    Last edited by Gianfranco Di Gennaro; 14 Oct 2024, 00:46.

  • #2
    The question of whether to fit an ANCOVA versus an ANOVA of change scores versus a repeated-measures ANOVA (or mixed model) has come up on the list a few times. I think Clyde Schechter has responded to this kind of query fairly recently on the list and you might want to search for those threads.

    You've got an additional wrinkle in that you have a multivariate response—do you plan to fit a separate model to each brain region?

    Regardless, if you decide to go with your second option, I recommend against fitting a regression model of the form
    Code:
    regress delta ... c.baseline

    Comment


    • #3
      Dear Joseph Coveney ,
      Thanks for your reply.
      I have also faced the problem of how to handle the problem (Lord's paradox and so on), but always and only in the context of randomized trials.
      We are in an observational setting here and I would not want to be missing something.
      I will try to look at the material written by Clyde.

      Why would you avoid using delta as a dependent variable? for a question of statistical power?
      Thanks as always.
      Gianfranco

      Comment


      • #4
        Originally posted by Gianfranco Di Gennaro View Post
        Why would you avoid using delta as a dependent variable? for a question of statistical power?
        No, it's a matter of cleaner interpretation: with the baseline on both sides of the equation, you'll get a negative slope even if there's no relationship between baseline and follow-up. If you're going the ANCOVA route, then I'd recommend using the follow-up volume value as the response variable and not the change score.

        Comment


        • #5
          Thanks again Joseph Coveney . I've got one last question (I'm sorry I'm bothering you)

          I did the analysis in two ways. I give you the example for one of the outcomes called "a18".
          The coefficent of interest is "
          Months_between_MRI"

          The first analysis by setting xtset id and using xtreg. I get as a result:

          Code:
          . xtreg a18  Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment
          
          Random-effects GLS regression                   Number of obs     =        150
          Group variable: id                              Number of groups  =         76
          
          R-squared:                                      Obs per group:
               Within  = 0.4138                                         min =          1
               Between = 0.1139                                         avg =        2.0
               Overall = 0.2590                                         max =          2
          
                                                          Wald chi2(6)      =      49.99
          corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
          
          -------------------------------------------------------------------------------------------------
                                      a18 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
          --------------------------------+----------------------------------------------------------------
                                      Age |  -.0007703   .0027236    -0.28   0.777    -.0061085    .0045679
                                   Gender |   .0787674   .0843433     0.93   0.350    -.0865424    .2440772
                       Months_between_MRI |  -.0080851   .0014173    -5.70   0.000     -.010863   -.0053072
                                Diagnosis |     .09306   .1139897     0.82   0.414    -.1303556    .3164757
          Bilateral_tonic_clonic_seizures |  -.1138652   .0841719    -1.35   0.176    -.2788392    .0511087
                    Response_to_treatment |  -.2532708   .0851716    -2.97   0.003     -.420204   -.0863377
                                    _cons |   2.759624   .1371241    20.13   0.000     2.490866    3.028382
          --------------------------------+----------------------------------------------------------------
                                  sigma_u |          0
                                  sigma_e |  .47044115
                                      rho |          0   (fraction of variance due to u_i)
          -------------------------------------------------------------------------------------------------
          Code:
          
          



          The second one is using only the time to followup and using the outcome to followup and the baseline as a covariate ("baselinea18"). I get as a result:

          Code:
          . regress a18 Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment baselinea18 if time==2
          
                Source |       SS           df       MS      Number of obs   =        74
          -------------+----------------------------------   F(7, 66)        =      1.70
                 Model |  2.21838843         7  .316912632   Prob > F        =    0.1238
              Residual |   12.289674        66  .186207182   R-squared       =    0.1529
          -------------+----------------------------------   Adj R-squared   =    0.0631
                 Total |  14.5080624        73  .198740581   Root MSE        =    .43152
          
          -------------------------------------------------------------------------------------------------
                                      a18 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          --------------------------------+----------------------------------------------------------------
                                      Age |   .0025199   .0034668     0.73   0.470    -.0044018    .0094415
                                   Gender |   .0112612   .1064215     0.11   0.916    -.2012162    .2237385
                       Months_between_MRI |   .0009198   .0018946     0.49   0.629    -.0028628    .0047025
                                Diagnosis |  -.0727264   .1474503    -0.49   0.623    -.3671205    .2216676
          Bilateral_tonic_clonic_seizures |   -.098934   .1056947    -0.94   0.353    -.3099603    .1120923
                    Response_to_treatment |  -.2484847   .1111558    -2.24   0.029    -.4704144    -.026555
                              baselinea18 |   .0980757   .1190883     0.82   0.413    -.1396918    .3358432
                                    _cons |   1.861148   .3915305     4.75   0.000     1.079432    2.642864
          -------------------------------------------------------------------------------------------------
          Code:
          
          


          As you can see, the coefficients of "Months_between_MRI" differ between the two models. I'm having trouble understanding the correct interpretation of the two models and which one I should choose. Could you please give me some advice? I thank you very much. Gianfranco

          Comment


          • #6
            the problem is that in your first model you do not include baselinea18 as a predictor - this makes a big difference for all other covariates and also makes the two model non-comparable

            Comment


            • #7
              Thank you very much Rich Goldstein
              Here's the model
              Code:
              . xtreg a18  Age Gender Months_between_MRI Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment baselinea18
              
              Random-effects GLS regression                   Number of obs     =        150
              Group variable: id                              Number of groups  =         76
              
              R-squared:                                      Obs per group:
                   Within  = 0.4154                                         min =          1
                   Between = 0.5569                                         avg =        2.0
                   Overall = 0.4493                                         max =          2
              
                                                              Wald chi2(7)      =     115.86
              corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
              
              -------------------------------------------------------------------------------------------------
                                          a18 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
              --------------------------------+----------------------------------------------------------------
                                          Age |  -.0003501    .002357    -0.15   0.882    -.0049698    .0042696
                                       Gender |   .0084377   .0736542     0.11   0.909    -.1359219    .1527973
                           Months_between_MRI |  -.0085497   .0012279    -6.96   0.000    -.0109565    -.006143
                                    Diagnosis |  -.0480849   .1006518    -0.48   0.633    -.2453587     .149189
              Bilateral_tonic_clonic_seizures |  -.0495956   .0733942    -0.68   0.499    -.1934455    .0942544
                        Response_to_treatment |  -.1092835   .0764967    -1.43   0.153    -.2592143    .0406473
                                  baselinea18 |   .5724818   .0817266     7.00   0.000     .4123007    .7326629
                                        _cons |     1.1258   .2616763     4.30   0.000     .6129242    1.638676
              --------------------------------+----------------------------------------------------------------
                                      sigma_u |          0
                                      sigma_e |  .47044115
                                          rho |          0   (fraction of variance due to u_i)
              -------------------------------------------------------------------------------------------------
              However, as Joseph Coveney suggested, don't I put the baseline on both sides of the equation this way? In the first model don't I already adjust for baseline, since the baseline appears as an outcome?

              My question remains, however, what's the difference between the three models? How do I interpret the Months_between_MRI coefficients in all three models in clinical practice?

              Thanks again Joseph Coveney and Rich Goldstein

              Comment


              • #8
                Gianfranco Di Gennaro The baseline variable doesn't exist in the dataex you provided in post #1.

                Are you reshaping the data from long to wide when you run regress and vice versa?

                Comment


                • #9
                  Sorry for the length of the post but I hope it clarifies my question. Thank you in advance for your precious time. I appreciate your attention.

                  Thanks Erik Ruzek . No,I have artificially created a variable "baselinea19" where I insert the value of the outcome "a19"at time0 in both rows of month=0 and followup. My doubt is whether it makes sense to insert the baseline in this way since the value of baseline is already considered as outcome when I use xtreg.

                  I'll make it even simpler, with a clearer example.
                  I
                  have this dataset where there is a repeated measures outcome at two timepoints (0, 1), the patient's sex, the patient's id, the day (day) on which I took the outcome measurement, the outcome value. The first measurement is always taken at day=0 (it's the baseline visit).
                  I would like to estimate how the outcome changes as one day increases within the same patient, correcting for age and sex.
                  Here's the dataset:
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input byte(patientid timepoint) float outcome byte(sex days age)
                  1 0 2.5 0  0 24
                  1 1 2.7 0  4 24
                  2 0 1.7 1  0 45
                  2 1 1.9 1  3 45
                  3 0  .1 1  0 16
                  3 1  .9 1  3 16
                  4 0  .4 0  0 41
                  4 1  .6 0 10 41
                  5 0  .9 1  0 22
                  5 1 1.5 1  2 22
                  6 0 2.1 1  0 36
                  6 1 3.4 1  4 36
                  end
                  If I used
                  Code:
                  xtset patientid
                  xtreg outcome day sex age
                  with this command do I already correct for the baseline, that is for the outcome value at day=0? Or do I have to create an additional baseline variable in which to put the outcome value at day=0, and do
                  Code:
                  xtreg outcome day sex age baseline
                  And what difference would there be if I did a reshape
                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input byte patientid float outcome0 byte days0 float outcome1 byte(days1 sex age)
                  1 2.5 0 2.7  4 0 24
                  2 1.7 0 1.9  3 1 45
                  3  .1 0  .9  3 1 16
                  4  .4 0  .6 10 0 41
                  5  .9 0 1.5  2 1 22
                  6 2.1 0 3.4  4 1 36
                  end


                  and use
                  Code:
                  regress outcome1 days1 sex age outcome0
                  where outcome0 is the baseline and days1 is the number of days at followup?

                  What's the difference in the interpretation of the two models?

                  The results of the two models are
                  Code:
                  . xtreg outcome day sex age
                  
                  Random-effects GLS regression                   Number of obs     =         12
                  Group variable: patientid                       Number of groups  =          6
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.3439                                         min =          2
                       Between = 0.0018                                         avg =        2.0
                       Overall = 0.0148                                         max =          2
                  
                                                                  Wald chi2(3)      =       1.99
                  corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.5742
                  
                  ------------------------------------------------------------------------------
                       outcome | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                          days |    .072706    .053197     1.37   0.172    -.0315581    .1769702
                           sex |   .1920139   1.027715     0.19   0.852    -1.822271    2.206298
                           age |   .0124007   .0455258     0.27   0.785    -.0768282    .1016295
                         _cons |   .8925067   1.696921     0.53   0.599    -2.433398    4.218412
                  -------------+----------------------------------------------------------------
                       sigma_u |  1.0306988
                       sigma_e |  .42936078
                           rho |  .85212798   (fraction of variance due to u_i)
                  and

                  Code:
                  . regress outcome1 days1 sex age outcome0
                  
                        Source |       SS           df       MS      Number of obs   =         6
                  -------------+----------------------------------   F(4, 1)         =  77272.52
                         Model |  5.71331521         4   1.4283288   Prob > F        =    0.0027
                      Residual |  .000018484         1  .000018484   R-squared       =    1.0000
                  -------------+----------------------------------   Adj R-squared   =    1.0000
                         Total |   5.7133337         5  1.14266674   Root MSE        =     .0043
                  
                  ------------------------------------------------------------------------------
                      outcome1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                         days1 |   .3666696   .0020753   176.68   0.004     .3403005    .3930388
                           sex |   1.994462   .0091586   217.77   0.003     1.878091    2.110833
                           age |  -.0543073   .0003129  -173.55   0.004    -.0582833   -.0503313
                      outcome0 |   1.608513   .0036902   435.89   0.001     1.561625    1.655402
                         _cons |  -1.484045   .0122749  -120.90   0.005    -1.640012   -1.328078
                  ------------------------------------------------------------------------------

                  If instead, as I said above, I artificially created the "baseline" variable equal to the outcome value at day=0, the dataset would be like this:

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input byte(patientid timepoint) float outcome byte(sex days age) float baseline
                  1 0 2.5 0  0 24 2.5
                  1 1 2.7 0  4 24 2.5
                  2 0 1.7 1  0 45 1.7
                  2 1 1.9 1  3 45 1.7
                  3 0  .1 1  0 16  .1
                  3 1  .9 1  3 16  .1
                  4 0  .4 0  0 41  .4
                  4 1  .6 0 10 41  .4
                  5 0  .9 1  0 22  .9
                  5 1 1.5 1  2 22  .9
                  6 0 2.1 1  0 36 2.1
                  6 1 3.4 1  4 36 2.1
                  end
                  and the model results would be

                  Code:
                  . xtreg outcome day sex age baseline
                  
                  Random-effects GLS regression                   Number of obs     =         12
                  Group variable: patientid                       Number of groups  =          6
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.3439                                         min =          2
                       Between = 0.9830                                         avg =        2.0
                       Overall = 0.9029                                         max =          2
                  
                                                                  Wald chi2(4)      =      65.09
                  corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
                  
                  ------------------------------------------------------------------------------
                       outcome | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                          days |   .0870145   .0444425     1.96   0.050    -.0000911    .1741201
                           sex |   .4349607   .2611439     1.67   0.096    -.0768719    .9467933
                           age |  -.0095149   .0114937    -0.83   0.408    -.0320422    .0130124
                      baseline |   1.098391   .1384804     7.93   0.000     .8269742    1.369807
                         _cons |  -.0379828   .4335223    -0.09   0.930    -.8876709    .8117053
                  -------------+----------------------------------------------------------------
                       sigma_u |          0
                       sigma_e |  .42936078
                           rho |          0   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  Which of the three models answers my question, how the outcome varies for a one-day increase?
                  And what is the difference in interpretation of the three models?



                  Comment


                  • #10
                    Originally posted by Gianfranco Di Gennaro View Post
                    . . . I did the analysis in two ways.. . . I'm having trouble understanding the correct interpretation of the two models and which one I should choose.
                    With your first model (xtreg . . .), the variance component for patients has collapsed to zero and you're basically fitting a conventional linear regression model without any random effects term for patient. I would consider that fit pathological. It's seen occasionally when using Stata's generalized least squares implementation to fit a random effects linear regression model. It seems that the patient characteristics (time-invariant covariates) account for all of the between-patient variation for this outcome variable's intercept.

                    Based upon that, if it were me and I had to choose between those two, then I'd opt for the second, the ANCOVA model.

                    On the other hand, I would not have initially gone to xtreg , re to fit the hierarchical model; rather, I would have attempted to fit either a fixed-effects model—what you're conducting is an exploratory analysis of an observational study after all—perhaps something like the following
                    Code:
                    xtreg a18 c.Months_between_MRI, i(id) fe
                    or else if random effects, then something like the following.
                    Code:
                    mixed a18 c.Age i.Gender c.Months_between_MRI ///
                        i.(Diagnosis Bilateral_tonic_clonic_seizures Response_to_treatment) || id: , ///
                            reml dfmethod(kroger) nolog
                    By the way, in your toy example above in #9, your four predictors basically explain all of the variance of the six observations: the ANCOVA's R2 is 1.

                    Comment

                    Working...
                    X