How does the 'mixed' command process the group without variation in an indicator when estimating the fixed effect of this indicator?

Hong Yan

Join Date: Mar 2023

Posts: 13
#1

How does the 'mixed' command process the group without variation in an indicator when estimating the fixed effect of this indicator?

29 Oct 2024, 08:28

Hi all,

I have a question regarding the coefficient estimation for multilevel linear models.

Considering a situation in a nested dataset where the observations of group A have the same value for the indicator X, and all other groups have a variation in this indicator. When estimating a MLM using the 'mixed' command, how is the fixed effect of the indicator X calculated? Are the observations (with the same X value) of group A included to calculate the coefficient of X?

I checked the Stata manual but did not find answers. I appreciate any response and references!

Best,
Hong
Tags: mixed effect, multilevel
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

29 Oct 2024, 10:38

Yes, those observations are included. -mixed- does not estimate within-group effects. So the absence of variation of X within a group of observations (or several groups of observations, as long as there is some variation in X somewhere in the data) calls for no special treatment. The presence of groups where X does not vary does diminish the effective sample size that powers the estimation of X, so your standard error is likely to be larger than it would be without such groups in the data. But that's just the natural consequence of the usual estimation. This X is not treated differently from any other.
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#3

29 Oct 2024, 13:53

I agree with Clyde's response here. If you wanted mixed to give you a purely within-group estimate of X, you can simply include the group mean of X as a second predictor. Having done so, the coefficient for X is now the within-group estimate of Y on X adjusting for covariates. The coefficient for the group mean of X is the difference between the within and between group effect of Y on X.
1 like
Comment
Hong Yan

Join Date: Mar 2023

Posts: 13
#4

29 Oct 2024, 14:27

Originally posted by Clyde Schechter View Post

Yes, those observations are included. -mixed- does not estimate within-group effects. So the absence of variation of X within a group of observations (or several groups of observations, as long as there is some variation in X somewhere in the data) calls for no special treatment. The presence of groups where X does not vary does diminish the effective sample size that powers the estimation of X, so your standard error is likely to be larger than it would be without such groups in the data. But that's just the natural consequence of the usual estimation. This X is not treated differently from any other.

Hi Clyde,

Many thanks for your quick answer.

I am still a bit confused. Observations within groups with unvaried X are ineffective samples and result in a reduced effective sample size. Does it mean that these observations do not contribute to the coefficient calculation of X, or are they the same as other observations for calculating the coefficient but are not counted as effective sample size, which makes the standard error bigger (only influences the significance)?

Best,
Hong
Comment
Hong Yan

Join Date: Mar 2023

Posts: 13
#5

29 Oct 2024, 14:40

Originally posted by Erik Ruzek View Post

I agree with Clyde's response here. If you wanted mixed to give you a purely within-group estimate of X, you can simply include the group mean of X as a second predictor. Having done so, the coefficient for X is now the within-group estimate of Y on X adjusting for covariates. The coefficient for the group mean of X is the difference between the within and between group effect of Y on X.

Thank you Erik!

If I understand right, without adding the group mean of X as a second predictor, the coefficient of X is a combination of both within-group effects and between-group effects. Although observations within groups without varied X do not contribute to the within-group effects, they are still necessary for the between-group effects and overall effect (the coefficient) of X.

Best,
Hong
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

29 Oct 2024, 15:04

Observations within groups with unvaried X are ineffective samples and result in a reduced effective sample size. Does it mean that these observations do not contribute to the coefficient calculation of X, or are they the same as other observations for calculating the coefficient but are not counted as effective sample size, which makes the standard error bigger (only influences the significance)?

Sorry for not being clearer, and especially for using that language about "diminishing the effective sample size"--that's really not the right way of thinking about it.. The observations in non-X-varying grouops are absolutely the same as other observations for calculating the coefficient. And there is no actual "counting" for effective sample size: it's just that if you have many observations with the same value of X (whether they are in identified groups or not), that will make the variance of X smaller than if there were many different values, and the smaller variance of X, all else being equal, leads to higher standard errors and lower power.

This would be true even in one-level models where there is no explicit grouping. Here's an illustration:

Code:

clear* sysuse auto // USING THE ORIGINAL DATA summ mpg regress price mpg // PICK HALF OF THE OBSERVATIONS AT RANDOM AND REPLACE THEM ALL BY THEIR // MEAN VALUE set seed 1234 gen double shuffle = runiform() sort shuffle summ mpg if _n*2 <= _N, meanonly replace mpg = r(mean) if _n*2 <= _N // REPEAT THE CALCULATIONS WITH THE MODIFIED DATA summ mpg regress price mpg

You can see that by replacing half of the values of mpg by the mean of those values, the mean of mpg remains unchanged, but the standard deviation has increased considerably. In the regression of price on mpg, we see that while the coefficient has changed slightly, a decrease in magnitude of about 9%, the standard error has gone up by about 60%.
1 like
Comment
Hong Yan

Join Date: Mar 2023

Posts: 13
#7

30 Oct 2024, 02:16

Originally posted by Clyde Schechter View Post

Sorry for not being clearer, and especially for using that language about "diminishing the effective sample size"--that's really not the right way of thinking about it.. The observations in non-X-varying grouops are absolutely the same as other observations for calculating the coefficient. And there is no actual "counting" for effective sample size: it's just that if you have many observations with the same value of X (whether they are in identified groups or not), that will make the variance of X smaller than if there were many different values, and the smaller variance of X, all else being equal, leads to higher standard errors and lower power.

This would be true even in one-level models where there is no explicit grouping. Here's an illustration:

Code:

clear* sysuse auto // USING THE ORIGINAL DATA summ mpg regress price mpg // PICK HALF OF THE OBSERVATIONS AT RANDOM AND REPLACE THEM ALL BY THEIR // MEAN VALUE set seed 1234 gen double shuffle = runiform() sort shuffle summ mpg if _n*2 <= _N, meanonly replace mpg = r(mean) if _n*2 <= _N // REPEAT THE CALCULATIONS WITH THE MODIFIED DATA summ mpg regress price mpg

You can see that by replacing half of the values of mpg by the mean of those values, the mean of mpg remains unchanged, but the standard deviation has increased considerably. In the regression of price on mpg, we see that while the coefficient has changed slightly, a decrease in magnitude of about 9%, the standard error has gone up by about 60%.

It is very clear, thank you very much!
Comment

Announcement

How does the 'mixed' command process the group without variation in an indicator when estimating the fixed effect of this indicator?

Comment

Comment

Comment

Comment

Comment

Comment