MARGINS after ANOVA producing different means from the means by SUM

SeungYong Han

Join Date: Jul 2015
Posts: 53

MARGINS after ANOVA producing different means from the means by SUM

28 Jan 2022, 11:42

Hi, I am running repeated-measures ANOVA, and MARGINS command after ANOVA produces slightly different means from the means by using SUM command. I am trying to figure out why.

Here is the data. There are two-time points (time=0/1) and two groups (Z and E).

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id float(time group) double outcome
 1 0 1    5
14 0 1  5.5
11 0 0  7.5
16 0 1    4
 9 0 1    3
 8 0 1  2.5
 5 0 0    4
 4 0 1    6
 7 0 0    4
10 0 0    6
 2 0 1    5
 6 0 0    6
13 0 1  5.5
 3 0 0  6.5
12 0 0    4
15 0 1    5
17 0 0    7
 7 1 0    4
12 1 0    5
11 1 0 7.66
 6 1 0  8.5
10 1 0    7
14 1 1  6.5
17 1 0    8
 8 1 1    4
 4 1 1  6.5
16 1 1    5
 5 1 0  7.5
 2 1 1  4.5
 1 1 1    .
 9 1 1    8
13 1 1    5
 3 1 0    .
15 1 1  7.5
end
label values group group_l
label def group_l 0 "Z", modify
label def group_l 1 "E", modify

The interaction between time and group is my primary interest. Here is the code for ANOVA-MARGINS and SUM.

Code:

anova outcome group /id|group time group##time, repeated(time)
margins group#time, noestimcheck
by group, sort: tabstat sd_tst_t, by(time2) statistics(n mean sd min max)

But when I use the code below, I get identical numbers. As far as I understand, I need to use the code above to use the correct error terms for tests, so I am trying to figure out how to explain the differences and why I get different results.

Code:

anova sd_tst_t group group##time
margins group#time
by group, sort: tabstat sd_tst_t, by(time2) statistics(n mean sd min max)

Below is my primary interest just in case.

Code:

contrast rb0.time2@group, effect level(95)
contrast rb0.time2#r.group, effect level(95)

I would really appreciate any of your comments.
Thank you.

Tags: None

Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

28 Jan 2022, 17:12

Originally posted by SeungYong Han View Post

. . . MARGINS command after ANOVA produces slightly different means from the means by using SUM command. I am trying to figure out why.

I would really appreciate any of your comments.

1. I don't see SUM command used anywhere.

2. What is time2?

3. what is sd_tst_t?

4. Two observations have missing outcomes (the second observation each for participant IDs 1 and 3). Repeated-measures ANOVA requires balanced data in the repeated measurements. You don't have that. Use -mixed-, instead.
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#3

28 Jan 2022, 17:24

Sorry. I just noticed that.
time2 ==> time

sd_tst_t ==> outcome

Yes, I know that rm-anova requires balanced data, but it looks like the model uses all cases anyway.
And even when I run the same model with the balanced data (n=30), I still get different numbers. @@

Code:

tabdisp id time group, cellvar(outcome) gen exclude=0 replace exclude=1 if inlist(id, 1, 3) anova outcome group /id|group time group##time if exclude==0, repeated(time) margins group#time, noestimcheck by group, sort: tabstat outcome if exclude==0, by(time) statistics(n mean sd min max)
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#4

28 Jan 2022, 17:30

Ok, so when I use the balanced data, I should use "over" within margins command. Those two commands produce different numbers! It seems like it's the combination of balanced data status and the command.

Code:

margins group#time, noestimcheck margins, over(group time)

Last edited by SeungYong Han; 28 Jan 2022, 18:26.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#5

29 Jan 2022, 04:00

In nested designs such as the repeated measures ANOVA, you have empty cells because ids are not in both groups. -margins- is telling you that with its "(not estimable)" message that you have to forcibly override with the -noestimcheck- option.

But you can tell -margins- to fill in the empty cells in the design matrix with the combination of -asbalanced- and -emptycells(reweight)- options. So, in your case you'd use the following syntax.

Code:

margins group#time, asbalanced emptycells(reweight)

I've illustrated it below with your dataset--see the output below at the "Here" comment. (I abbreviated your variable names to three characters for brevity.)

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿquietlyÿinputÿbyteÿidÿfloat(timeÿgroup)ÿdoubleÿoutcome

.ÿ
.ÿquietlyÿcompress

.ÿ
.ÿlabelÿdefineÿGroupsÿ0ÿZÿ1ÿE

.ÿlabelÿvaluesÿgroupÿGroups

.ÿ
.ÿrenameÿidÿpid

.ÿrenameÿgroupÿgrp

.ÿrenameÿtimeÿtim

.ÿrenameÿoutcomeÿout

.ÿ
.ÿsortÿpidÿtim

.ÿ
.ÿlistÿifÿinlist(pid,ÿ1,ÿ3),ÿnoobsÿsepby(pid)

ÿÿ+-----------------------+
ÿÿ|ÿpidÿÿÿtimÿÿÿgrpÿÿÿoutÿ|
ÿÿ|-----------------------|
ÿÿ|ÿÿÿ1ÿÿÿÿÿ0ÿÿÿÿÿEÿÿÿÿÿ5ÿ|
ÿÿ|ÿÿÿ1ÿÿÿÿÿ1ÿÿÿÿÿEÿÿÿÿÿ.ÿ|
ÿÿ|-----------------------|
ÿÿ|ÿÿÿ3ÿÿÿÿÿ0ÿÿÿÿÿZÿÿÿ6.5ÿ|
ÿÿ|ÿÿÿ3ÿÿÿÿÿ1ÿÿÿÿÿZÿÿÿÿÿ.ÿ|
ÿÿ+-----------------------+

.ÿ
.ÿquietlyÿanovaÿoutÿgrpÿ/ÿpid|grpÿtimÿgrp#timÿifÿ!inlist(pid,ÿ1,ÿ3)

.ÿmarginsÿ,ÿover(grpÿtim)

PredictiveÿmarginsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ30

Expression:ÿLinearÿprediction,ÿpredict()
Over:ÿÿÿÿÿÿÿgrpÿtim

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿMarginÿÿÿstd.ÿerr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿgrp#timÿ|
ÿÿÿÿÿÿÿÿZ#0ÿÿ|ÿÿÿÿÿÿÿÿ5.5ÿÿÿ.4193653ÿÿÿÿ13.12ÿÿÿ0.000ÿÿÿÿÿ4.594016ÿÿÿÿ6.405984
ÿÿÿÿÿÿÿÿZ#1ÿÿ|ÿÿÿ6.808571ÿÿÿ.4193653ÿÿÿÿ16.24ÿÿÿ0.000ÿÿÿÿÿ5.902588ÿÿÿÿ7.714555
ÿÿÿÿÿÿÿÿE#0ÿÿ|ÿÿÿÿÿ4.5625ÿÿÿ.3922803ÿÿÿÿ11.63ÿÿÿ0.000ÿÿÿÿÿÿ3.71503ÿÿÿÿÿ5.40997
ÿÿÿÿÿÿÿÿE#1ÿÿ|ÿÿÿÿÿÿ5.875ÿÿÿ.3922803ÿÿÿÿ14.98ÿÿÿ0.000ÿÿÿÿÿÿ5.02753ÿÿÿÿÿ6.72247
------------------------------------------------------------------------------

.ÿ
.ÿ*
.ÿ*ÿHere
.ÿ*
.ÿmarginsÿgrp#tim,ÿasbalancedÿemptycells(reweight)

AdjustedÿpredictionsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿ30

Expression:ÿÿLinearÿprediction,ÿpredict()
Emptyÿcells:ÿreweight
At:ÿgrpÿÿÿ(asbalanced)
ÿÿÿÿpidÿÿÿ(asbalanced)
ÿÿÿÿtimÿÿÿ(asbalanced)

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿMarginÿÿÿstd.ÿerr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿconf.ÿinterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿgrp#timÿ|
ÿÿÿÿÿÿÿÿZ#0ÿÿ|ÿÿÿÿÿÿÿÿ5.5ÿÿÿ.4193653ÿÿÿÿ13.12ÿÿÿ0.000ÿÿÿÿÿ4.594016ÿÿÿÿ6.405984
ÿÿÿÿÿÿÿÿZ#1ÿÿ|ÿÿÿ6.808571ÿÿÿ.4193653ÿÿÿÿ16.24ÿÿÿ0.000ÿÿÿÿÿ5.902588ÿÿÿÿ7.714555
ÿÿÿÿÿÿÿÿE#0ÿÿ|ÿÿÿÿÿ4.5625ÿÿÿ.3922803ÿÿÿÿ11.63ÿÿÿ0.000ÿÿÿÿÿÿ3.71503ÿÿÿÿÿ5.40997
ÿÿÿÿÿÿÿÿE#1ÿÿ|ÿÿÿÿÿÿ5.875ÿÿÿ.3922803ÿÿÿÿ14.98ÿÿÿ0.000ÿÿÿÿÿÿ5.02753ÿÿÿÿÿ6.72247
------------------------------------------------------------------------------

.ÿ
.ÿversionÿ16.1:ÿtableÿgrpÿtimÿifÿ!inlist(pid,ÿ1,ÿ3),ÿcontents(meanÿout)

--------------------------------
ÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿtimÿÿÿÿÿÿÿÿÿ
ÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
----------+---------------------
ÿÿÿÿÿÿÿÿZÿ|ÿÿÿÿÿÿÿ5.5ÿÿ6.8085714
ÿÿÿÿÿÿÿÿEÿ|ÿÿÿÿ4.5625ÿÿÿÿÿÿ5.875
--------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

For more information, take a look at the user's manual entry for -margins- and scroll down to the header "Obtaining margins with nested designs" and then to the subheader "Margins with nested designs as though the data were balanced".
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#6

29 Jan 2022, 09:37

Thank you so much, #Joseph Coveney. This is exactly what I was looking for! More Qs
So, does this mean that I have to use balanced data for repeated-measures ANOVA? What do the results mean if I don't exclude those cases (ID=1 and 3) with missing values on the outcome variable? I certainly get different numbers for margins and hence contrast, but I am not sure if that is ok since it's model-estimated, or they are just incorrect because it's from unbalanced data.

Is there a way to get the same means (margins and sum) when I use the full sample (including ID=1 and 3)? What matters at the end is getting the same numbers for contrast, but that depends on margins, I believe.

And it seems like everything discussed here for margins applies to mixed (multilevel modeling) as well. Please let me know if I am wrong.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#7

29 Jan 2022, 17:47

If you want to include those two participants' data, then you have no choice but to use -mixed-. -margins- works after that estimation command, too.

Code:

mixed out i.grp##i.tim || pid: , reml dfmethod(kroger) nolrtest nolog margins grp#tim, df(`e(df_max)')
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#8

29 Jan 2022, 19:23

I see. More questions! (sorry to bother you again and again).

1) I ran commands with/without "df(`e(df_max)', and I get the same results. Could you please explain what it is for?
2) When I used mixed to use all cases, I don't get the same numbers either, and I am not sure why.

Code:

by tim, sort: tabstat out, by(grp) statistics(mean sd count min max) mixed out i.grp##i.tim || id: , reml dfmethod(kroger) nolrtest nolog margins grp#tim, df(`e(df_max)')

#2 is actually directly related to my question on the previous posting: https://www.statalist.org/forums/for...-values-by-sum
No one answered, and I wonder if you can take a look. I believe it's about the same issue.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#9

29 Jan 2022, 19:58

Originally posted by SeungYong Han View Post

1) I ran commands with/without "df(`e(df_max)', and I get the same results.

No you don't. One is a Z statistic and the other is a T statistic.

2) When I used mixed to use all cases, I don't get the same numbers either, and I am not sure why.

Why would you expect it to?

Code:

predict double xbu, fitted list if inlist(pid, 1, 3), noobs sepby(pid)
1 like
Comment
SeungYong Han

Join Date: Jul 2015

Posts: 53
#10

30 Jan 2022, 08:16

1) I didn't notice that. Thank you! Any reason why t test, instead of z test? sample size?

2) I expect the same results because I usually want to test the interaction between time and group without any covariates (the first model, at least) by using mixed in order to use the full sample. But I get different numbers from sum and mixed, which leads to different numbers for the interaction. I've been searching for the reason why so that I can explain it at least in the manuscript for publications. As far as I understand your last comment and code, the reason I get slightly different numbers is that I also get predicated values at time==1 for those cases (pid=1, 3). Please correct me if I am wrong. I think it is becoming more and more sense now, but just want to make sure I am understanding this correctly. And I get the same numbers when I excluded those two cases because nothing is estimated at time==1 for those cases.

Thank you so much for sharing your insight and knowledge about this. I really appreciate it.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#11

30 Jan 2022, 15:34

Yes, sample size: you have only 17 participants. Yes, -margins- basically is showing the model's predictions.
1 like
Comment

Announcement

MARGINS after ANOVA producing different means from the means by SUM

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment