Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • MARGINS after ANOVA producing different means from the means by SUM

    Hi, I am running repeated-measures ANOVA, and MARGINS command after ANOVA produces slightly different means from the means by using SUM command. I am trying to figure out why.

    Here is the data. There are two-time points (time=0/1) and two groups (Z and E).
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id float(time group) double outcome
     1 0 1    5
    14 0 1  5.5
    11 0 0  7.5
    16 0 1    4
     9 0 1    3
     8 0 1  2.5
     5 0 0    4
     4 0 1    6
     7 0 0    4
    10 0 0    6
     2 0 1    5
     6 0 0    6
    13 0 1  5.5
     3 0 0  6.5
    12 0 0    4
    15 0 1    5
    17 0 0    7
     7 1 0    4
    12 1 0    5
    11 1 0 7.66
     6 1 0  8.5
    10 1 0    7
    14 1 1  6.5
    17 1 0    8
     8 1 1    4
     4 1 1  6.5
    16 1 1    5
     5 1 0  7.5
     2 1 1  4.5
     1 1 1    .
     9 1 1    8
    13 1 1    5
     3 1 0    .
    15 1 1  7.5
    end
    label values group group_l
    label def group_l 0 "Z", modify
    label def group_l 1 "E", modify

    The interaction between time and group is my primary interest. Here is the code for ANOVA-MARGINS and SUM.
    Code:
    anova outcome group /id|group time group##time, repeated(time)
    margins group#time, noestimcheck
    by group, sort: tabstat sd_tst_t, by(time2) statistics(n mean sd min max)

    But when I use the code below, I get identical numbers. As far as I understand, I need to use the code above to use the correct error terms for tests, so I am trying to figure out how to explain the differences and why I get different results.
    Code:
    anova sd_tst_t group group##time
    margins group#time
    by group, sort: tabstat sd_tst_t, by(time2) statistics(n mean sd min max)

    Below is my primary interest just in case.
    Code:
    contrast rb0.time2@group, effect level(95)
    contrast rb0.time2#r.group, effect level(95)

    I would really appreciate any of your comments.
    Thank you.





  • #2
    Originally posted by SeungYong Han View Post
    . . . MARGINS command after ANOVA produces slightly different means from the means by using SUM command. I am trying to figure out why.

    I would really appreciate any of your comments.
    1. I don't see SUM command used anywhere.

    2. What is time2?

    3. what is sd_tst_t?

    4. Two observations have missing outcomes (the second observation each for participant IDs 1 and 3). Repeated-measures ANOVA requires balanced data in the repeated measurements. You don't have that. Use -mixed-, instead.

    Comment


    • #3
      Sorry. I just noticed that.
      • time2 ==> time
      • sd_tst_t ==> outcome
      Yes, I know that rm-anova requires balanced data, but it looks like the model uses all cases anyway.
      And even when I run the same model with the balanced data (n=30), I still get different numbers. @@

      Code:
      tabdisp id time group, cellvar(outcome)
      
      gen exclude=0
      replace exclude=1 if inlist(id, 1, 3)
      
      anova outcome group /id|group time group##time if exclude==0, repeated(time)
      margins group#time, noestimcheck
      by group, sort: tabstat outcome if exclude==0, by(time) statistics(n mean sd min max)

      Comment


      • #4
        Ok, so when I use the balanced data, I should use "over" within margins command. Those two commands produce different numbers! It seems like it's the combination of balanced data status and the command.

        Code:
        margins group#time, noestimcheck
        margins, over(group time)
        Last edited by SeungYong Han; 28 Jan 2022, 19:26.

        Comment


        • #5
          In nested designs such as the repeated measures ANOVA, you have empty cells because ids are not in both groups. -margins- is telling you that with its "(not estimable)" message that you have to forcibly override with the -noestimcheck- option.

          But you can tell -margins- to fill in the empty cells in the design matrix with the combination of -asbalanced- and -emptycells(reweight)- options. So, in your case you'd use the following syntax.
          Code:
          margins group#time, asbalanced emptycells(reweight)
          I've illustrated it below with your dataset--see the output below at the "Here" comment. (I abbreviated your variable names to three characters for brevity.)

          .ÿ
          .ÿversionÿ17.0

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿquietlyÿinputÿbyteÿidÿfloat(timeÿgroup)ÿdoubleÿoutcome

          .ÿ
          .ÿquietlyÿcompress

          .ÿ
          .ÿlabelÿdefineÿGroupsÿ0ÿZÿ1ÿE

          .ÿlabelÿvaluesÿgroupÿGroups

          .ÿ
          .ÿrenameÿidÿpid

          .ÿrenameÿgroupÿgrp

          .ÿrenameÿtimeÿtim

          .ÿrenameÿoutcomeÿout

          .ÿ
          .ÿsortÿpidÿtim

          .ÿ
          .ÿlistÿifÿinlist(pid,ÿ1,ÿ3),ÿnoobsÿsepby(pid)

          ÿÿ+-----------------------+
          ÿÿ|ÿpidÿÿÿtimÿÿÿgrpÿÿÿoutÿ|
          ÿÿ|-----------------------|
          ÿÿ|ÿÿÿ1ÿÿÿÿÿ0ÿÿÿÿÿEÿÿÿÿÿ5ÿ|
          ÿÿ|ÿÿÿ1ÿÿÿÿÿ1ÿÿÿÿÿEÿÿÿÿÿ.ÿ|
          ÿÿ|-----------------------|
          ÿÿ|ÿÿÿ3ÿÿÿÿÿ0ÿÿÿÿÿZÿÿÿ6.5ÿ|
          ÿÿ|ÿÿÿ3ÿÿÿÿÿ1ÿÿÿÿÿZÿÿÿÿÿ.ÿ|
          ÿÿ+-----------------------+

          .ÿ
          .ÿquietlyÿanovaÿoutÿgrpÿ/ÿpid|grpÿtimÿgrp#timÿifÿ!inlist(pid,ÿ1,ÿ3)

          .ÿmarginsÿ,ÿover(grpÿtim)

          PredictiveÿmarginsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ30

          Expression:ÿLinearÿprediction,ÿpredict()
          Over:ÿÿÿÿÿÿÿgrpÿtim

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿMarginÿÿÿstd.ÿerr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿgrp#timÿ|
          ÿÿÿÿÿÿÿÿZ#0ÿÿ|ÿÿÿÿÿÿÿÿ5.5ÿÿÿ.4193653ÿÿÿÿ13.12ÿÿÿ0.000ÿÿÿÿÿ4.594016ÿÿÿÿ6.405984
          ÿÿÿÿÿÿÿÿZ#1ÿÿ|ÿÿÿ6.808571ÿÿÿ.4193653ÿÿÿÿ16.24ÿÿÿ0.000ÿÿÿÿÿ5.902588ÿÿÿÿ7.714555
          ÿÿÿÿÿÿÿÿE#0ÿÿ|ÿÿÿÿÿ4.5625ÿÿÿ.3922803ÿÿÿÿ11.63ÿÿÿ0.000ÿÿÿÿÿÿ3.71503ÿÿÿÿÿ5.40997
          ÿÿÿÿÿÿÿÿE#1ÿÿ|ÿÿÿÿÿÿ5.875ÿÿÿ.3922803ÿÿÿÿ14.98ÿÿÿ0.000ÿÿÿÿÿÿ5.02753ÿÿÿÿÿ6.72247
          ------------------------------------------------------------------------------

          .ÿ
          .ÿ*
          .ÿ*ÿHere
          .ÿ*
          .ÿmarginsÿgrp#tim,ÿasbalancedÿemptycells(reweight)

          AdjustedÿpredictionsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ30

          Expression:ÿÿLinearÿprediction,ÿpredict()
          Emptyÿcells:ÿreweight
          At:ÿgrpÿÿÿ(asbalanced)
          ÿÿÿÿpidÿÿÿ(asbalanced)
          ÿÿÿÿtimÿÿÿ(asbalanced)

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿMarginÿÿÿstd.ÿerr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿgrp#timÿ|
          ÿÿÿÿÿÿÿÿZ#0ÿÿ|ÿÿÿÿÿÿÿÿ5.5ÿÿÿ.4193653ÿÿÿÿ13.12ÿÿÿ0.000ÿÿÿÿÿ4.594016ÿÿÿÿ6.405984
          ÿÿÿÿÿÿÿÿZ#1ÿÿ|ÿÿÿ6.808571ÿÿÿ.4193653ÿÿÿÿ16.24ÿÿÿ0.000ÿÿÿÿÿ5.902588ÿÿÿÿ7.714555
          ÿÿÿÿÿÿÿÿE#0ÿÿ|ÿÿÿÿÿ4.5625ÿÿÿ.3922803ÿÿÿÿ11.63ÿÿÿ0.000ÿÿÿÿÿÿ3.71503ÿÿÿÿÿ5.40997
          ÿÿÿÿÿÿÿÿE#1ÿÿ|ÿÿÿÿÿÿ5.875ÿÿÿ.3922803ÿÿÿÿ14.98ÿÿÿ0.000ÿÿÿÿÿÿ5.02753ÿÿÿÿÿ6.72247
          ------------------------------------------------------------------------------

          .ÿ
          .ÿversionÿ16.1:ÿtableÿgrpÿtimÿifÿ!inlist(pid,ÿ1,ÿ3),ÿcontents(meanÿout)

          --------------------------------
          ÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿtimÿÿÿÿÿÿÿÿÿ
          ÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ----------+---------------------
          ÿÿÿÿÿÿÿÿZÿ|ÿÿÿÿÿÿÿ5.5ÿÿ6.8085714
          ÿÿÿÿÿÿÿÿEÿ|ÿÿÿÿ4.5625ÿÿÿÿÿÿ5.875
          --------------------------------

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          For more information, take a look at the user's manual entry for -margins- and scroll down to the header "Obtaining margins with nested designs" and then to the subheader "Margins with nested designs as though the data were balanced".

          Comment


          • #6
            Thank you so much, #Joseph Coveney. This is exactly what I was looking for! More Qs
            • So, does this mean that I have to use balanced data for repeated-measures ANOVA? What do the results mean if I don't exclude those cases (ID=1 and 3) with missing values on the outcome variable? I certainly get different numbers for margins and hence contrast, but I am not sure if that is ok since it's model-estimated, or they are just incorrect because it's from unbalanced data.
            • Is there a way to get the same means (margins and sum) when I use the full sample (including ID=1 and 3)? What matters at the end is getting the same numbers for contrast, but that depends on margins, I believe.
            • And it seems like everything discussed here for margins applies to mixed (multilevel modeling) as well. Please let me know if I am wrong.

            Comment


            • #7
              If you want to include those two participants' data, then you have no choice but to use -mixed-. -margins- works after that estimation command, too.
              Code:
              mixed out i.grp##i.tim || pid: , reml dfmethod(kroger) nolrtest nolog
              margins grp#tim, df(`e(df_max)')

              Comment


              • #8
                I see. More questions! (sorry to bother you again and again).

                1) I ran commands with/without "df(`e(df_max)', and I get the same results. Could you please explain what it is for?
                2) When I used mixed to use all cases, I don't get the same numbers either, and I am not sure why.
                Code:
                by tim, sort: tabstat out, by(grp) statistics(mean sd count min max)
                
                mixed out i.grp##i.tim || id: , reml dfmethod(kroger) nolrtest nolog
                margins grp#tim, df(`e(df_max)')
                #2 is actually directly related to my question on the previous posting: https://www.statalist.org/forums/for...-values-by-sum
                No one answered, and I wonder if you can take a look. I believe it's about the same issue.

                Comment


                • #9
                  Originally posted by SeungYong Han View Post
                  1) I ran commands with/without "df(`e(df_max)', and I get the same results.
                  No you don't. One is a Z statistic and the other is a T statistic.

                  2) When I used mixed to use all cases, I don't get the same numbers either, and I am not sure why.
                  Why would you expect it to?
                  Code:
                  predict double xbu, fitted
                  list if inlist(pid, 1, 3), noobs sepby(pid)

                  Comment


                  • #10
                    1) I didn't notice that. Thank you! Any reason why t test, instead of z test? sample size?

                    2) I expect the same results because I usually want to test the interaction between time and group without any covariates (the first model, at least) by using mixed in order to use the full sample. But I get different numbers from sum and mixed, which leads to different numbers for the interaction. I've been searching for the reason why so that I can explain it at least in the manuscript for publications. As far as I understand your last comment and code, the reason I get slightly different numbers is that I also get predicated values at time==1 for those cases (pid=1, 3). Please correct me if I am wrong. I think it is becoming more and more sense now, but just want to make sure I am understanding this correctly. And I get the same numbers when I excluded those two cases because nothing is estimated at time==1 for those cases.

                    Thank you so much for sharing your insight and knowledge about this. I really appreciate it.

                    Comment


                    • #11
                      Yes, sample size: you have only 17 participants. Yes, -margins- basically is showing the model's predictions.

                      Comment

                      Working...
                      X