Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about the mechanism and the application of the -mvtest-?

    Hello everyone,

    I am attempting to compare the means of a variable across three different periods (e.g., var1_pretest, var1_posttest1, var1_posttest2).

    I attempted to merge the data from the three periods into a single variable and conducted an ANOVA. Then I discovered that there is a -mvtest mean- command available for comparing the means of one variable, two variables, or more. I also want to try. However, I have some confusion regarding the -mvtest- command. Let me start with an example:
    Code:
    mvtest m srspre srspost srsdpost
    The result is like:
    Test that all means are the same

    Hotelling T2 = 17.59
    Hotelling F(2,120) = 8.72
    Prob > F = 0.0003
    First, based on the results, we understand that the mechanism of -mvtest- follows Hotelling's T², as mentioned in the paper Implementation of a New Solution to the Multivariate Behrens–Fisher Problem (Ivan Zezula, 2009). My question is: since Hotelling's T² is used for comparing two variables, should -mvtest- adhere to the same principle? However, the manual indicates that -mvtest- allows for comparisons between two or more variables. How can this be reconciled with the principles of Hotelling's T²?

    Second, Hotelling's T² requires two independent samples. If my data were collected from the same individuals at different time points, would -mvtest- still be applicable?

    Third, -mvtest- requires the variables to be normally distributed. We have 122 observations, and some of the variables are not normally distributed. Are there any improving methods (e.g., non-parametric tests) that could be used in this case?"

    Thank you so much!

  • #2
    Originally posted by Vincent Li View Post
    . . .since Hotelling's T² is used for comparing two variables, should -mvtest- adhere to the same principle? However, the manual indicates that -mvtest- allows for comparisons between two or more variables. How can this be reconciled with the principles of Hotelling's T²?
    Hotelling's test allows for more than two outcome variables. It allows up to two grouping variables. The syntax for hotel allows the user to omit the grouping variable and then it tests whether the mean vector of the multiple outcome variables is equal to zero. So, if you want to use hotel instead for this, you'll want to first generate variables that contain a linearly independent set of differences between outcome variables, like what mvtest means is doing for you here behind the scenes. It's analogous to forming a ytransform() matrix for use with manovatest after manova.

    Run the following code to see all three in action.
    Code:
    version 18.0
    
    clear *
    
    set seed 1562972107
    
    quietly drawnorm out0 out1 out2, double ///
        corr(1 0.5 0.5 \ 0.5 1 0.5 \ 0.5 0.5 1) n(27)
    
    *
    * Begin here
    *
    // -mvtest means-
    mvtest means out?
    
    // -manova-
    generate byte k = 1
    quietly manov out? = i.k, noconstant
    matrix define T = (-1, 1, 0 \ -1, 0, 1)
    manovatest k, ytransform(T)
    
    // -hotel-
    generate double del01 = out0 - out1
    generate double del02 = out0 - out2
    hotel del??
    
    exit
    Second, Hotelling's T² requires two independent samples. If my data were collected from the same individuals at different time points, would -mvtest- still be applicable?
    Yes, and so would hotel on the linearly independent set of differences and without a grouping variable—see above.

    Third, -mvtest- requires the variables to be normally distributed. We have 122 observations, and some of the variables are not normally distributed. Are there any improving methods (e.g., non-parametric tests) that could be used in this case?
    Differencing might help get things more toward normal a little bit, but also check out the suggestions in this thread.

    Comment


    • #3
      Seems like you're making this harder than necessary, but perhaps I'm missing something.

      Code:
      sysuse auto, clear
      
      ** Let's make rep78 be the 3 periods
      tab rep78
      recode rep78 (1 2 3 = 0) (4 = 1) (5 = 2) , generate(period)
      
      ** Test equality of means in all three periods
      reg foreign i.period
      testparm i.period
      ** Note that the F-Test of the regression is a joint test of equality across periods, so testparm not required

      Comment


      • #4
        Originally posted by Vincent Li View Post
        We have 122 observations, and some of the variables are not normally distributed. Are there any improving methods (e.g., non-parametric tests) that could be used in this case?
        After posting last night I began thinking more along the lines that George subsequently posted, but with a tweak to account for the facts that you've got repeated measurements and gnarly data.

        Consider something along these lines:
        Code:
        rename (srspre srspost srsdpost) (srs#), addnumber(0)
        
        generate int pid = _n
        reshape long srs, i(pid) j(tim)
        
        xtreg srs i.tim, i(pid) fe vce(robust)

        Comment


        • #5
          Originally posted by George Ford View Post
          Seems like you're making this harder than necessary, but perhaps I'm missing something.

          Code:
          sysuse auto, clear
          
          ** Let's make rep78 be the 3 periods
          tab rep78
          recode rep78 (1 2 3 = 0) (4 = 1) (5 = 2) , generate(period)
          
          ** Test equality of means in all three periods
          reg foreign i.period
          testparm i.period
          ** Note that the F-Test of the regression is a joint test of equality across periods, so testparm not required
          Thanks for the idea George!

          Comment


          • #6
            Originally posted by Joseph Coveney View Post
            After posting last night I began thinking more along the lines that George subsequently posted, but with a tweak to account for the facts that you've got repeated measurements and gnarly data.

            Consider something along these lines:
            Code:
            rename (srspre srspost srsdpost) (srs#), addnumber(0)
            
            generate int pid = _n
            reshape long srs, i(pid) j(tim)
            
            xtreg srs i.tim, i(pid) fe vce(robust)
            Hi Joseph, thank you for the detailed instructions on comparing mvtest, mavova, and hotel, as well as for addressing my questions individually. Your explanations have clarified my understanding of mvtest and hotel.

            I followed your suggestion to use the fixed effects approach to examine the data. I really appreciate your guidance. Now, I would like to further analyze the impact of non-time-variant variables on the period's effect. For example, I'm interested in whether the time effect varies among the three training groups, which are represented by condition=0, 1, and 2. I have created dummy variables for these groups (condummy1, condummy2, condummy3) and am now incorporating the interaction term into the fixed effects model. Is it an appropriate way to get what I want?

            The code is:
            Code:
            tab condition, gen (condummy)
            xtreg srs i.period##i.condummy1 i.period##i.condummy2, i(indi_num) fe vce(robust)

            The results are:
            note: 1.condummy1 omitted because of collinearity.
            note: 1.condummy2 omitted because of collinearity.

            Fixed-effects (within) regression Number of obs = 366
            Group variable: indi_num Number of groups = 122

            R-squared: Obs per group:
            Within = 0.1207 min = 3
            Between = 0.0085 avg = 3.0
            Overall = 0.0233 max = 3

            F(6,121) = 3.47
            corr(u_i, Xb) = -0.0070 Prob > F = 0.0034

            (Std. err. adjusted for 122 clusters in indi_num)
            ----------------------------------------------------------------------------------
            | Robust
            srs | Coefficient std. err. t P>|t| [95% conf. interval]
            -----------------+----------------------------------------------------------------
            period |
            1 | -1.05 1.444759 -0.73 0.469 -3.910281 1.810281
            2 | -1.675 1.809901 -0.93 0.357 -5.258176 1.908176
            |
            1.condummy1 | 0 (omitted)
            |
            period#condummy1 |
            1 1 | -7.92561 3.122852 -2.54 0.012 -14.10812 -1.743101
            2 1 | -6.593293 2.887441 -2.28 0.024 -12.30974 -.8768409
            |
            1.condummy2 | 0 (omitted)
            |
            period#condummy2 |
            1 1 | -1.315854 2.35835 -0.56 0.578 -5.984829 3.353121
            2 1 | -3.154268 2.744708 -1.15 0.253 -8.588141 2.279604
            |
            _cons | 93.39344 .7288553 128.14 0.000 91.95048 94.8364
            -----------------+----------------------------------------------------------------
            sigma_u | 19.523581
            sigma_e | 8.8829199
            rho | .82849315 (fraction of variance due to u_i)
            ----------------------------------------------------------------------------------

            .
            end of do-file
            Here’s my interpretation of the results. Please let me know if anything needs correction.

            For those who are under the training condition 1, we refer to this part:
            period#condummy1 |
            1 1 | -7.92561 3.122852 -2.54 0.012 -14.10812 -1.743101
            2 1 | -6.593293 2.887441 -2.28 0.024 -12.30974 -.8768409
            |
            It indicates that in this group, the scores in period 1 decreased about 7.93 (on average) compared to that in period 0 (p<0.05); the scores in period 2 decreased about 6.59 (on average) compared to that in period 0 (p<0.05).

            The way of interpreting the scores in condition 2 and 3 goes the same way. However, we should refer to this part for the results for condition 3:
            period |
            1 | -1.05 1.444759 -0.73 0.469 -3.910281 1.810281
            2 | -1.675 1.809901 -0.93 0.357 -5.258176 1.908176
            One more question: Is it necessary to compare the coefficients between conditions 1, 2, and 3?

            Thanks a lot!

            Comment


            • #7
              Originally posted by Vincent Li View Post
              Now, I would like to further analyze the impact of non-time-variant variables on the period's effect.
              In that case, you'd probably want to switch from a fixed-effects regression model to a random-effects model.

              Code:
              // tab condition, gen (condummy)
              xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
              You seem to have balanced data (three nonmissing observations for each indi_num). In that case you could actually use manova if you wanted to. You can also fit a MANOVA model with mixed.

              But you probably have enough indi_num groups (122) to stick with xtreg , re vce(robust).

              One more question: Is it necessary to compare the coefficients between conditions 1, 2, and 3?
              If you have a practically significant condition × period interaction, then you wouldn't typically try to interpret the main effects of condition.

              Comment


              • #8
                Originally posted by Joseph Coveney View Post
                In that case, you'd probably want to switch from a fixed-effects regression model to a random-effects model.

                Code:
                // tab condition, gen (condummy)
                xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                You seem to have balanced data (three nonmissing observations for each indi_num). In that case you could actually use manova if you wanted to. You can also fit a MANOVA model with mixed.

                But you probably have enough indi_num groups (122) to stick with xtreg , re vce(robust).

                If you have a practically significant condition × period interaction, then you wouldn't typically try to interpret the main effects of condition.

                Thanks, Joseph! I’ll start with the random effects model for preliminary results, and then compare it with the MANOVA.

                Comment


                • #9
                  Originally posted by Joseph Coveney View Post
                  In that case, you'd probably want to switch from a fixed-effects regression model to a random-effects model.

                  Code:
                  // tab condition, gen (condummy)
                  xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                  You seem to have balanced data (three nonmissing observations for each indi_num). In that case you could actually use manova if you wanted to. You can also fit a MANOVA model with mixed.

                  But you probably have enough indi_num groups (122) to stick with xtreg , re vce(robust).

                  If you have a practically significant condition × period interaction, then you wouldn't typically try to interpret the main effects of condition.
                  Hi Joseph, I'm sticking to the fixed effect (and random effect) and there are 2 more questions:

                  First, there are still 3 time points (pre, po1, and po2). The pre scores are measured before the intervention, while po1 and po2 scores are taken after the intervention that can be categorized into three different conditions. Now I'd like to estimate whether there are differences among the po1/po2 score changes after 3 different interventions. The data is as follow:
                  period condition score
                  0 0 150
                  0 1 100
                  0 1 130
                  1 1 81
                  1 2 74
                  2 1 88
                  2 1 63
                  1 0 69
                  1 2 89
                  0 1 88
                  0 1 113
                  In this case, should I transform the data to
                  pre po1 po2 condition
                  65 45 78 0
                  45 87 98 0
                  46 56 56 1
                  98 86 109 3
                  134 123 78 2
                  167 45 90 4
                  and use regression to estimate?
                  Code:
                  reg po1 pre condition
                  reg po2 pre condition
                  However, the time-invariant factors can not be controlled in this way.

                  The second question is: regarding the time effect, the observations need to be restricted to 40 and 60 (still across the three conditions, with 20 observations in each condition) out of the total 122 observations. Does the fixed effect (robust) still work for such small sample size, or should I consider a substitute method?

                  Thank you!

                  Comment


                  • #10
                    Originally posted by Vincent Li View Post
                    . . . pre scores are measured before the intervention, while po1 and po2 scores are taken after the intervention that can be categorized into three different conditions. Now I'd like to estimate whether there are differences among the po1/po2 score changes after 3 different interventions. : . . . However, the time-invariant factors can not be controlled in this way.
                    Well, if the outcome of interest is the difference between the two posttreatment scores, then why not just compute it and use it in the model as the outcome? Using your second data-listing snippet (which seems to have more than three treatment conditions):
                    Code:
                    generate int delta = pos2 - pos1
                    regress delta c.pre i.condition
                    or something similar, adding other time-invariant covariates as desired to the two (baseline score and treatment-condition assignment) already present.

                    The second question is: regarding the time effect, the observations need to be restricted to 40 and 60 (still across the three conditions, with 20 observations in each condition) out of the total 122 observations. Does the fixed effect (robust) still work for such small sample size, or should I consider a substitute method?
                    xtreg . . ., . . . fe vce(robust)yields t-statistics in its regression table output. So, as long as you've taken power into consideration, sample sizes of 40 or 60 shouldn't be problematic.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      Well, if the outcome of interest is the difference between the two posttreatment scores, then why not just compute it and use it in the model as the outcome? Using your second data-listing snippet (which seems to have more than three treatment conditions):
                      Code:
                      generate int delta = pos2 - pos1
                      regress delta c.pre i.condition
                      or something similar, adding other time-invariant covariates as desired to the two (baseline score and treatment-condition assignment) already present.

                      xtreg . . ., . . . fe vce(robust)yields t-statistics in its regression table output. So, as long as you've taken power into consideration, sample sizes of 40 or 60 shouldn't be problematic.

                      For the first question, the outcomes of interest are twofold: the difference between the po1 and pre, and the difference between the po2 and pre. I hesitated to compute the difference between po1/po2 and pre because I was wasting time on considering what the coefficient will be when the dependent variable includes both positive and negative values, disregarding the intercept. Stupid me! However, I also have concerns about the normality of the differences between po1/po2 and pre.

                      I ran the models for two dep vars: po1; difference between po1 and pre.

                      Code:
                      reg srspost c.srspre ib2.condition c.wsl c.tom c.age
                      the result is as follow:

                      Source | SS df MS Number of obs = 122
                      -------------+---------------------------------- F(6, 115) = 37.98
                      Model | 37889.3418 6 6314.89029 Prob > F = 0.0000
                      Residual | 19120.7648 115 166.26752 R-squared = 0.6646
                      -------------+---------------------------------- Adj R-squared = 0.6471
                      Total | 57010.1066 121 471.157905 Root MSE = 12.894

                      ------------------------------------------------------------------------------
                      srspost | Coefficient Std. err. t P>|t| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      srspre | .8802146 .0606535 14.51 0.000 .7600717 1.000358
                      |
                      condition |
                      0 | -8.631837 2.887065 -2.99 0.003 -14.35056 -2.913117
                      1 | -1.842946 2.873328 -0.64 0.523 -7.534456 3.848565
                      |
                      wsl | -.157649 .0765809 -2.06 0.042 -.3093409 -.0059571
                      tom | -.8889637 .8959931 -0.99 0.323 -2.663754 .8858262
                      age | -.8961679 1.091415 -0.82 0.413 -3.058051 1.265716
                      _cons | 35.36209 11.84376 2.99 0.003 11.90188 58.82229
                      ------------------------------------------------------------------------------


                      Code:
                      reg srspo1predif ib2.condition c.wsl c.tom c.age
                      the result is like:

                      Source | SS df MS Number of obs = 122
                      -------------+---------------------------------- F(5, 116) = 3.78
                      Model | 3224.78602 5 644.957204 Prob > F = 0.0033
                      Residual | 19769.255 116 170.424612 R-squared = 0.1402
                      -------------+---------------------------------- Adj R-squared = 0.1032
                      Total | 22994.041 121 190.033397 Root MSE = 13.055

                      ------------------------------------------------------------------------------
                      srspo1predif | Coefficient Std. err. t P>|t| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      condition |
                      0 | -8.582322 2.922824 -2.94 0.004 -14.37134 -2.793301
                      1 | -1.485669 2.903255 -0.51 0.610 -7.235931 4.264593
                      |
                      wsl | -.1817746 .0765395 -2.37 0.019 -.3333707 -.0301785
                      tom | -.8288866 .906602 -0.91 0.362 -2.624526 .9667528
                      age | -1.404122 1.073854 -1.31 0.194 -3.531025 .7227806
                      _cons | 29.60951 11.62261 2.55 0.012 6.589463 52.62956
                      ------------------------------------------------------------------------------
                      The F value (and Prob > F) and R-squared in the first model looks better. The performances of condition are similar.

                      but for some of the dep vars, the two models have different results for condition's coefficient, p value and CI.

                      Code:
                      reg sdqispost c.sdqispre ib2.condition c.wsl c.tom c.age
                      Source | SS df MS Number of obs = 122
                      -------------+---------------------------------- F(6, 115) = 20.60
                      Model | 668.896605 6 111.482767 Prob > F = 0.0000
                      Residual | 622.226346 115 5.41066388 R-squared = 0.5181
                      -------------+---------------------------------- Adj R-squared = 0.4929
                      Total | 1291.12295 121 10.6704376 Root MSE = 2.3261

                      ------------------------------------------------------------------------------
                      sdqispost | Coefficient Std. err. t P>|t| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      sdqispre | .6495078 .0620629 10.47 0.000 .526573 .7724425
                      |
                      condition |
                      0 | -1.298285 .5218639 -2.49 0.014 -2.331997 -.2645736
                      1 | .0556832 .5179003 0.11 0.915 -.9701775 1.081544
                      |
                      wsl | -.0127479 .0137624 -0.93 0.356 -.0400087 .0145128
                      tom | .1100009 .1615385 0.68 0.497 -.2099758 .4299776
                      age | -.0168836 .1913646 -0.09 0.930 -.3959401 .3621729
                      _cons | 4.422794 2.111437 2.09 0.038 .2404438 8.605145
                      ------------------------------------------------------------------------------

                      Code:
                      reg sdqispo1predif ib2.condition c.wsl c.tom c.age
                      Source | SS df MS Number of obs = 122
                      -------------+---------------------------------- F(5, 116) = 1.65
                      Model | 56.4339912 5 11.2867982 Prob > F = 0.1530
                      Residual | 794.78732 116 6.85161483 R-squared = 0.0663
                      -------------+---------------------------------- Adj R-squared = 0.0261
                      Total | 851.221311 121 7.03488687 Root MSE = 2.6176

                      ------------------------------------------------------------------------------
                      sdqispo1pr~f | Coefficient Std. err. t P>|t| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      condition |
                      0 | -1.109221 .5860477 -1.89 0.061 -2.269962 .0515205
                      1 | .1962028 .582124 0.34 0.737 -.9567671 1.349173
                      |
                      wsl | -.0231838 .0153467 -1.51 0.134 -.05358 .0072123
                      tom | .1111372 .1817804 0.61 0.542 -.2489017 .4711761
                      age | -.034449 .2153156 -0.16 0.873 -.4609087 .3920107
                      _cons | 2.097989 2.330419 0.90 0.370 -2.5177 6.713678
                      ------------------------------------------------------------------------------



                      Regarding the second question, I often worry about the estimating power when working with a small sample size. I'm uncertain about the efficiency of a model as the sample size decreases, likely due to my limited understanding of the mechanisms behind those methods. Some may refer to the "rule" that a regression model is valid only if the number of observations exceeds 10 times the number of variables. However, I don't believe this applies to every situation.
                      Last edited by Vincent Li; 24 Feb 2025, 02:27.

                      Comment


                      • #12
                        Originally posted by Vincent Li View Post
                        . . . the outcomes of interest are twofold: the difference between the po1 and pre, and the difference between the po2 and pre.
                        A few observations:

                        1. It will be better to state the outcomes more precisely as "the difference between treatment conditions in the difference between po1 and pre scores"; likewise, "the difference between treatment conditions in the difference between po2 and pre scores".

                        2. Your ANCOVA-like model (the first one whose output you show) doesn't really address your two outcomes of interest. Rather, it estimates a baseline-adjusted score at the first period for each treatment condition, not the difference from baseline (pretreatment) score, and so it doesn't actually do what you want.

                        3. You might be making things harder for yourself than you need to. The following code should succinctly evaluate each of your two outcomes of interest as shown after their respective the comments in the code. (You keep changing the names of your outcome variables, and so I'll use the variable names of your first post above for the scores at pretreatment and the two posttreatment observation time points.)
                        Code:
                        rename (srspre srspost srsdpost) (srs#), addnumber(0)
                        reshape long srs, i(indi_num) j(period)
                        
                        xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                        
                        /* Does the difference between the first posttreatment score and the
                           pretreatment score differ between treatment conditions? */
                        testparm 1.condition#1.period 2.condition#1.period
                        
                        /* Does the difference between the second posttreatment score and the
                           pretreatment score differ between treatment conditions? */
                        testparm 1.condition#2.period 2.condition#2.period
                        I strongly recommend that you also take a look at things with marginsplot.

                        . . . I often worry about the estimating power when working with a small sample size. I'm uncertain about the efficiency of a model as the sample size decreases . . .
                        Efficiency is a property of the method and not of the sample size. Nevertheless, you don't have to be uncertain about how the power is affected by sample size—you can determine it directly, either via official commands, such as power or by simulation (FAQ here and illustrative blog entries beginning here).

                        Comment


                        • #13
                          Originally posted by Joseph Coveney View Post
                          A few observations:

                          1. It will be better to state the outcomes more precisely as "the difference between treatment conditions in the difference between po1 and pre scores"; likewise, "the difference between treatment conditions in the difference between po2 and pre scores".

                          2. Your ANCOVA-like model (the first one whose output you show) doesn't really address your two outcomes of interest. Rather, it estimates a baseline-adjusted score at the first period for each treatment condition, not the difference from baseline (pretreatment) score, and so it doesn't actually do what you want.

                          3. You might be making things harder for yourself than you need to. The following code should succinctly evaluate each of your two outcomes of interest as shown after their respective the comments in the code. (You keep changing the names of your outcome variables, and so I'll use the variable names of your first post above for the scores at pretreatment and the two posttreatment observation time points.)
                          Code:
                          rename (srspre srspost srsdpost) (srs#), addnumber(0)
                          reshape long srs, i(indi_num) j(period)
                          
                          xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                          
                          /* Does the difference between the first posttreatment score and the
                          pretreatment score differ between treatment conditions? */
                          testparm 1.condition#1.period 2.condition#1.period
                          
                          /* Does the difference between the second posttreatment score and the
                          pretreatment score differ between treatment conditions? */
                          testparm 1.condition#2.period 2.condition#2.period
                          I strongly recommend that you also take a look at things with marginsplot.

                          Brilliant Joseph! Thanks! I was unsure about how to compare the score differences between conditions in the random effects model. Apparently, what I have done in the code:
                          Code:
                          reg srspost c.srspre ib2.condition c.wsl c.tom c.age
                          didn't play the same game. But tests for coefficient do!

                          I tried testparm after the estimation and there appears another question. The code is:
                          Code:
                          xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                          The result says:
                          Random-effects GLS regression Number of obs = 366 Group variable: indi_num Number of groups = 122 R-squared: Obs per group: Within = 0.1207 min = 3 Between = 0.0129 avg = 3.0 Overall = 0.0273 max = 3 Wald chi2(8) = 22.56 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0040 (Std. err. adjusted for 122 clusters in indi_num) ---------------------------------------------------------------------------------- | Robust srs | Coefficient std. err. z P>|z| [95% conf. interval] -----------------+---------------------------------------------------------------- condition | 1 | -2.707317 4.5169 -0.60 0.549 -11.56028 6.145645 2 | .1810976 4.571382 0.04 0.968 -8.778647 9.140842 | period | 1 | -8.97561 2.776295 -3.23 0.001 -14.41705 -3.534171 2 | -8.268293 2.256088 -3.66 0.000 -12.69014 -3.846442 | condition#period | 1 1 | 6.609756 3.346904 1.97 0.048 .0499453 13.16957 1 2 | 3.439024 3.061285 1.12 0.261 -2.560983 9.439032 2 1 | 7.92561 3.131587 2.53 0.011 1.787812 14.06341 2 2 | 6.593293 2.895518 2.28 0.023 .9181814 12.2684 | _cons | 94.2439 3.354506 28.09 0.000 87.66919 100.8186 -----------------+---------------------------------------------------------------- sigma_u | 18.959945 sigma_e | 8.8829199 rho | .82000724 (fraction of variance due to u_i) ----------------------------------------------------------------------------------
                          Then I ran:
                          Code:
                          testparm i.condition#i.period
                          The result is:

                          ( 1) 1.condition#1.period = 0
                          ( 2) 1.condition#2.period = 0
                          ( 3) 2.condition#1.period = 0
                          ( 4) 2.condition#2.period = 0

                          chi2( 4) = 7.59
                          Prob > chi2 = 0.1078
                          Does it imply that there is no "interaction effect" from the interaction term? But we did observe time differences in each condition in the random effect model? For eample:
                          2 1 | 7.92561 3.131587 2.53 0.011 1.787812 14.06341
                          2 2 | 6.593293 2.895518 2.28 0.023 .9181814 12.2684
                          Now, let me try to address my questions first. The results of the -testparm- do not convey the same information as those from the random effect model. The coefficients in the random effect indicate the average score changes from pre to po1/po2 under each condition (e.g. 1.condition#0.period, 1.condition#1.period). But the -testparm- outcomes compare the score changes between po1's change and po2's change under each condition, or the pre to po1/po2 score changes attributed to varied conditions. Did I interpret the outcomes correctly?

                          Furthermore, is the test comparable to what repeated anova does?
                          Code:
                          anova srs i.condition##i.period, rep(period) bseunit(indi_num)

                          Number of obs = 366 R-squared = 0.0273
                          Root MSE = 20.9377 Adj R-squared = 0.0055

                          Source | Partial SS df MS F Prob>F
                          -----------------+----------------------------------------------------
                          Model | 4384.3889 8 548.04861 1.25 0.2688
                          |
                          condition | 1806.747 2 903.37349 2.06 0.1289
                          period | 1705.2682 2 852.63408 1.94 0.1445
                          condition#period | 852.84952 4 213.21238 0.49 0.7458
                          |
                          Residual | 156503.72 357 438.38578
                          -----------------+----------------------------------------------------
                          Total | 160888.11 365 440.78935


                          Between-subjects error term: condition
                          Levels: 3 (2 df)
                          Lowest b.s.e. variable: indi_num
                          Covariance pooled over: condition (for repeated variable)

                          Repeated variable: period
                          Huynh-Feldt epsilon = 17.6494
                          *Huynh-Feldt epsilon reset to 1.0000
                          Greenhouse-Geisser epsilon = 0.9478
                          Box's conservative epsilon = 0.5000

                          ------------ Prob > F ------------
                          Source | df F Regular H-F G-G Box
                          -----------------+----------------------------------------------------
                          period | 2 1.94 0.1445 0.1445 0.1470 0.1649
                          condition#period | 4 0.49 0.7458 0.7458 0.7358 0.6157
                          Residual | 357
                          ----------------------------------------------------------------------
                          I was asked why I didn't perform a repeated ANOVA first to examine whether the condition#period interaction term shows differences. I didn't conduct ANOVA prior to fitting the random effects model, and I don't believe it should be necessary. Should I anticipate similar results from repeated ANOVA and the random effects model?


                          Additionally, thank you for providing the information on power and related guidance.
                          Last edited by Vincent Li; 27 Feb 2025, 07:57.

                          Comment


                          • #14
                            Why can't I edit my post twice?

                            Furthermore, is the test comparable to what repeated anova does?
                            I think they won't be similar. That's stupid. -test- performs wald test for coefficients, whila anova examine differences in dep var across varied groups.

                            Furthermore, is the test comparable to what repeated anova does?
                            In the repeated anova, does the outcome of the interaction term indicates:
                            the srs mean scores do not statistically significant different between these 4 groups: 1.condition 1.period, 1.condition 2.period, 2. condition 1.period, 2.condition 2.period?

                            Last edited by Vincent Li; 27 Feb 2025, 09:16.

                            Comment


                            • #15
                              To figure out the differences in using random effect and repeated anova, I ran following commands and got corresponding outcomes:

                              Regarding random effect:
                              Code:
                              xtreg srs i.condition##i.period, i(indi_num) re vce(robust)
                              the outcome:

                              Random-effects GLS regression Number of obs = 366
                              Group variable: indi_num Number of groups = 122

                              R-squared: Obs per group:
                              Within = 0.1207 min = 3
                              Between = 0.0129 avg = 3.0
                              Overall = 0.0273 max = 3

                              Wald chi2(8) = 22.56
                              corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0040

                              (Std. err. adjusted for 122 clusters in indi_num)
                              ----------------------------------------------------------------------------------
                              | Robust
                              srs | Coefficient std. err. z P>|z| [95% conf. interval]
                              -----------------+----------------------------------------------------------------
                              condition |
                              1 | -2.707317 4.5169 -0.60 0.549 -11.56028 6.145645
                              2 | .1810976 4.571382 0.04 0.968 -8.778647 9.140842
                              |
                              period |
                              1 | -8.97561 2.776295 -3.23 0.001 -14.41705 -3.534171
                              2 | -8.268293 2.256088 -3.66 0.000 -12.69014 -3.846442
                              |
                              condition#period |
                              1 1 | 6.609756 3.346904 1.97 0.048 .0499453 13.16957
                              1 2 | 3.439024 3.061285 1.12 0.261 -2.560983 9.439032
                              2 1 | 7.92561 3.131587 2.53 0.011 1.787812 14.06341
                              2 2 | 6.593293 2.895518 2.28 0.023 .9181814 12.2684
                              |
                              _cons | 94.2439 3.354506 28.09 0.000 87.66919 100.8186
                              -----------------+----------------------------------------------------------------
                              sigma_u | 18.959945
                              sigma_e | 8.8829199
                              rho | .82000724 (fraction of variance due to u_i)
                              ---------------------------------------------------------------------------------

                              Regarding repeated anova:
                              Code:
                              anova srs i.condition##i.period, rep(period) bseunit(indi_num)
                              margins i.condition##i.period

                              Number of obs = 366 R-squared = 0.0273
                              Root MSE = 20.9377 Adj R-squared = 0.0055

                              Source | Partial SS df MS F Prob>F
                              -----------------+----------------------------------------------------
                              Model | 4384.3889 8 548.04861 1.25 0.2688
                              |
                              condition | 1806.747 2 903.37349 2.06 0.1289
                              period | 1705.2682 2 852.63408 1.94 0.1445
                              condition#period | 852.84952 4 213.21238 0.49 0.7458
                              |
                              Residual | 156503.72 357 438.38578
                              -----------------+----------------------------------------------------
                              Total | 160888.11 365 440.78935


                              Between-subjects error term: condition
                              Levels: 3 (2 df)
                              Lowest b.s.e. variable: indi_num
                              Covariance pooled over: condition (for repeated variable)

                              Repeated variable: period
                              Huynh-Feldt epsilon = 17.6494
                              *Huynh-Feldt epsilon reset to 1.0000
                              Greenhouse-Geisser epsilon = 0.9478
                              Box's conservative epsilon = 0.5000

                              ------------ Prob > F ------------
                              Source | df F Regular H-F G-G Box
                              -----------------+----------------------------------------------------
                              period | 2 1.94 0.1445 0.1445 0.1470 0.1649
                              condition#period | 4 0.49 0.7458 0.7458 0.7358 0.6157
                              Residual | 357
                              ----------------------------------------------------------------------

                              Predictive margins Number of obs = 366

                              Expression: Linear prediction, predict()

                              ----------------------------------------------------------------------------------
                              | Delta-method
                              | Margin std. err. t P>|t| [95% conf. interval]
                              -----------------+----------------------------------------------------------------
                              condition |
                              0 | 88.49593 1.887886 46.88 0.000 84.78316 92.20871
                              1 | 89.13821 1.887886 47.22 0.000 85.42544 92.85099
                              2 | 93.51667 1.911338 48.93 0.000 89.75777 97.27556
                              |
                              period |
                              0 | 93.39344 1.895607 49.27 0.000 89.66548 97.1214
                              1 | 89.2377 1.895607 47.08 0.000 85.50974 92.96566
                              2 | 88.44262 1.895607 46.66 0.000 84.71466 92.17058
                              |
                              condition#period |
                              0 0 | 94.2439 3.269914 28.82 0.000 87.81319 100.6746
                              0 1 | 85.26829 3.269914 26.08 0.000 78.83758 91.69901
                              0 2 | 85.97561 3.269914 26.29 0.000 79.5449 92.40632
                              1 0 | 91.53659 3.269914 27.99 0.000 85.10587 97.9673
                              1 1 | 89.17073 3.269914 27.27 0.000 82.74002 95.60145
                              1 2 | 86.70732 3.269914 26.52 0.000 80.2766 93.13803
                              2 0 | 94.425 3.310535 28.52 0.000 87.9144 100.9356
                              2 1 | 93.375 3.310535 28.21 0.000 86.8644 99.8856
                              2 2 | 92.75 3.310535 28.02 0.000 86.2394 99.2606
                              ----------------------------------------------------------------------------------
                              Code:
                              egen cell=group(condition period)
                              robvar srs,by(cell)

                              group(condi |
                              tion | Summary of SRS
                              period) | Mean Std. dev. Freq.
                              ------------+------------------------------------
                              1 | 94.243902 21.418194 41
                              2 | 85.268293 20.802673 41
                              3 | 85.97561 19.875221 41
                              4 | 91.536585 19.313334 41
                              5 | 89.170732 22.386271 41
                              6 | 86.707317 22.106836 41
                              7 | 94.425 19.591845 40
                              8 | 93.375 21.675565 40
                              9 | 92.75 21.022271 40
                              ------------+------------------------------------
                              Total | 90.357923 20.994984 366

                              W0 = 0.19969635 df(8, 357) Pr > F = 0.99077821

                              W50 = 0.18388672 df(8, 357) Pr > F = 0.99302896

                              W10 = 0.19350631 df(8, 357) Pr > F = 0.99170859


                              According to repeated anova, it shows there are no sig dif in srs scores between condition*period groups. However, this is not consistent with the random effect outcomes. How should I interpret their inconsistency?
                              Thanks!


                              Here is the revised repeated anova code (between-subject error term changed):
                              Code:
                              anova srs i.condition##i.period indi_num, rep(period) bse(indi_num)
                              Here are the outcomes:


                              Number of obs = 366 R-squared = 0.8833
                              Root MSE = 8.88292 Adj R-squared = 0.8210

                              Source | Partial SS df MS F Prob>F
                              -----------------+----------------------------------------------------
                              Model | 142108.42 127 1118.9639 14.18 0.0000
                              |
                              condition | 7472.8889 2 3736.4444 47.35 0.0000
                              period | 1705.2682 2 852.63408 10.81 0.0000
                              condition#period | 852.84952 4 213.21238 2.70 0.0313
                              indi_num | 137724.03 119 1157.3448 14.67 0.0000
                              |
                              Residual | 18779.691 238 78.906267
                              -----------------+----------------------------------------------------
                              Total | 160888.11 365 440.78935


                              Between-subjects error term: indi_num
                              Levels: 122 (119 df)
                              Lowest b.s.e. variable: indi_num

                              Repeated variable: period
                              Huynh-Feldt epsilon = 0.9672
                              Greenhouse-Geisser epsilon = 0.9367
                              Box's conservative epsilon = 0.5000

                              ------------ Prob > F ------------
                              Source | df F Regular H-F G-G Box
                              -----------------+----------------------------------------------------
                              period | 2 10.81 0.0000 0.0000 0.0001 0.0013
                              condition#period | 4 2.70 0.0313 0.0330 0.0346 0.0712
                              Residual | 238
                              ----------------------------------------------------------------------
                              Is it more comparable to what the random effect model did?
                              Last edited by Vincent Li; 27 Feb 2025, 20:01.

                              Comment

                              Working...
                              X