Mixed effects (multilevel) model vs. cluster command

Francis Vergunst started a topic Mixed effects (multilevel) model vs. cluster command

27 Nov 2018, 10:12
Mixed effects (multilevel) model vs. cluster command

Hi all
I want to test the association between childhood behaviour at age 6 years and earnings at age 36 (n=1000). The behavioural assessments were obtained from teachers when the children were aged 6. I want to control for clustering in the behavioural assessments at the school and classroom levels, although I’m not testing predictors at those levels. From what I understand, the mixed model is better although I would only report the fixed effects estimates, not the random effects, which seems amounts to a regular regression (with adjusted SE estimates). What are the pros and cons of using a mixed effects model vs. the cluster command? The advantage of the cluster command is simplicity and that I can still report standardised betas. Any suggestions welcome. See examples below – they produce similar results.
Many thanks

mixed OUTCOME behav1 behav2 etc || school: || Class:, vce(robust)

egen double_cluster=group (school class)
regress OUTCOME behav1 behav2 etc, vce(cluster double_cluster) robust
Tags: None
Dino F Vitale replied

15 Jul 2024, 01:14
Thank you a lot both of you Clyde and Joseph,
you helped me solve the doubts I had in order to how to present to my collegues (and friends) a data analysis.
Your arguments will be of help also in the future.

Thank you a lot again.
Leave a comment:
Joseph Coveney replied

14 Jul 2024, 23:25
Originally posted by Dino F Vitale View Post

The simplest approach (for me) would be make a regression analysis including the vce (cluster id ) option where id refers to the label of each subject. . . . Consider: . . .It would be easier for a non-statistician understad regression models rather than the mixed model.

But then consider, too: would it be easier for you to explain cluster-robust standard errors to your audience?

Your model does not have a predictor to distinguish the patient's eyes, for either anatomical side or dominance. So it seems that for this outcome measure you consider them exchangeable, and you appear to have outcome measurements for both eyes in all patients. In that case, why not have your cake and eat it too, just by taking the arithmetic average of the two measurements for each patient and using that as the outcome variable in a simple linear regression model?

In this balanced case, it gives identical results for the regression coefficients and their standard errors (p-values, confidence intervals etc.) as the mixed approach.

Run the code below to see for yourself.

Code:

version 18.0 clear * // seedem set seed 1687955160 drawnorm e0 e1, double sd(15 15) corr(1 0.5 \ 0.5 1) n(75) generate `c(obs_t)' pid = _n generate byte dis = mod(_n, 2) generate byte age = runiformint(18, 90) generate double xb = 100 + dis * 5 + age / 10 + 0 * dis * age quietly reshape long e, i(pid) j(sid) generate double mea = xb + e * * Begin here * bysort pid (sid): egen double avg = mean(mea) regress avg i.dis c.age if !sid mixed mea i.dis c.age || pid: , reml dfmethod(kroger) nolrtest nolog exit

(Complete do-file and log file are attached for convenience.)

Given the statistical equivalence in your case, taking the average would seem to be easiest to do and easiest to explain to your audience.
Attached Files

exchangeable_eyes.do (619 Bytes, 1 view)

exchangeable_eyes.smcl (4.2 KB, 1 view)
1 like
Leave a comment:
Clyde Schechter replied

14 Jul 2024, 11:52
It depends on the intraclass correlation. If the intraclass correlation were 0, -regress- and -mixed- would give the same results. (Or, at least, they would if you also include -vce(cluster id)- in your -mixed- model, which, by the way, you should unless the number of patients is too small for that.) If the intraclass correlation is large, the error in relying on -regress- will be correspondingly large. The error you get largely is a matter of getting too small standard deviation, thus inflating the z-statistic and causing the p-value to be underestimated.

While it is true that a clinical audience would feel more comfortable with an ordinary linear regression than the mixed model, you don't need to torture them with the details of how mixed models work. In this situation, I would just say that the mixed model is just a fancier version of linear regression which better accounts the fact that the measurements of the two eyes in the same person are not independent observations. You don't have to explain anything more than that.

If your audience is at a very low level of sophistication so that even that is beyond them, you could offer an analogy. Political opinions tend to be (somewhat) correlated within households. An election poll that surveyed two people in each of 500 households would not be as informative as one that surveyed 1,000 unconnected people. Almost every college educated person can grasp that intuitively.
1 like
Leave a comment:
Dino F Vitale replied

14 Jul 2024, 11:35
Thank you all for the this post.
I would like pose a related question in an another scenario.
Consider to have a measure of function collected on each eye of a subject and wish to assess its association with the presence/absence of a given disease (a dichotomous variable) taking into account the confounding of age ( a continous variale).
A classical regression of the measure with disease and age would suffer from the fact that there is an obvious correlation between the 2 eyes of each subject and, therefore, there is a violation of the independece of the y assumption in the model.
The simplest approach (for me) would be make a regression analysis including the vce (cluster id ) option where id refers to the label of each subject.
Code:
regress measure i.disease age, vce (cluster id)

However it should be appropriate to build a mixed model as
code:
mixed measure i.disease age || id:

Both models give same coefiicient values and very close se on the data that I have at hand.

From the post I understand that the mixed approach has to be preferred.
If so why and how big can be the error. Specifically how big is the mistake to rely on R² and its partition in the regress model evaluating the relative weigth of the independent variable in the association ?

Consider:
1- From a recent (2018) inquiry in the medical literature related to the eyes results that “Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96).”. Documenting a persisting lack of appropriate statistical analysis for ocular data.
2- It would be easier for a non-statistician understad regression models rather than the mixed model.

Thank a lot in advance
Leave a comment:
Ishana Balan replied

17 Apr 2024, 00:13
Thanks Clyde! That was very helpful!
Leave a comment:
Clyde Schechter replied

16 Apr 2024, 19:58
No, it will not eliminate the systematic differences between hospitals. Those will be captured by the hospital level variables (as well as differences in the distributions of patient level variables across hospitals) to the extent they are measured. The unmeasured differences between hospitals will be picked up by the random intercepts. If you also believe that the effects of some of the patient level variables differ across hospitals, then you can add random slopes for those variables at the hospital level of the model to capture that.
1 like
Leave a comment:
Ishana Balan replied

16 Apr 2024, 19:20
Thank you all. the above thread was very useful. I just have one more question. I want to run a regression where the dependent variable is at patient level. The independent variables are both at patient level and hospital level. I am particularly interested in the hospital level variables. If I use mixed effects model, will it eliminate the systematic differences between hospitals? What is the best approach to use here?
Leave a comment:
Joanna Davies replied

14 Jul 2023, 07:05
Hello Clyde and others - thank you, i found this thread very useful but have a couple of follow up questions...

I'm using Stata v17. Fitting an nbreg model.

Outcome is a count of hospital visits. Main exposure of interest is categorical ethnicity. Several covariates, including local authority (based on area of residence: groups: ~300, min n of cases per group: ~20, max n of cases per group:~10,000). Overall large sample size >500,000

I think its reasonable to assume that there may be some intragroup correlation at local authority level and heteroskedasticity. So, im trying to decide a) how to test for both intragroup correlation and heteroskedasticity and b) how to handle it if it exists. But, the go to post estimation commands

Code:

estat icc

and

Code:

estat hettest

i think are not available with nbreg, so im not sure of the best way to proceed and hoping to get some advice please.

So far i have run a mixed model and obtained the icc using:

Code:

mixed y i.x1 x2 || group: estat icc

the icc was <.02

Question 1: based on this low icc is it reasonable to assume that i don't need a multilevel analysis and that i don't need to adjust for clustering via vce(cluster)? Or is there a better way to evaluate icc for my overdispersed count outcome?

Question 2: how can i evaluate heteroskedasticity after nbreg (or am i completely on the wrong track here?)

Assuming a mixed model isnt necessary, I can see 3 different options for my model and have tried them out with the data - I found very little difference between the models in terms of coef and se.

option 1

Code:

nbreg y i.x1 x2 i.group

option 2

Code:

nbreg y i.x1 x2 i.group, vce(robust)

option 3

Code:

nbreg y i.x1 x2, vce(cluster group)

Question 3: My feeling is that option 2 would be best (accounting for potential violations and based on the low icc above, vce(cluster) isnt needed) - but is there a good way to make a comparison between the models? And is including i.group in the model like this acceptable?

Grateful for any advice.

Thank you,
Joanna
Leave a comment:
Clyde Schechter replied

05 Mar 2021, 11:42
In situations where you are satisfied that heteroscedasticity and model misspecification are minimal or non-existent, and where observations are independent, or where dependence is fully accounted for with random intercepts, there is no need for vce(cluster). Also, bear in mind that cluster robust standard errors are asymptotically correct. When the number of clusters is small, these can actually be worse than the unclustered standard errors. There is no consensus about how small is small, but I myself would never use vce(cluster) with fewer than 15 clusters, and I might even require a larger number in some circumstances.
1 like
Leave a comment:
Jason Geng replied

04 Mar 2021, 23:43
Thanks again Clyde. So, is there ever a situation in which you wouldn’t want to use vce(cluster whatever) — no matter if you are using reg, mixed, melogit, etc?
Leave a comment:
Clyde Schechter replied

04 Mar 2021, 16:49
Yes, I would agree. Moreover, with a linear probability model for a dichotomous outcome, heteroscedasticity is virtually guaranteed.
Leave a comment:
Jason Geng replied

04 Mar 2021, 14:55
Originally posted by Clyde Schechter View Post

The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)

Thank you for the response. I'm working with a mixed model where the dependent variable is binary (student goes to college or not), and the 2 key independent variables are # of AP courses completed and standardized test scores, and each of 500 high schools is a cluster. It sounds like the random intercept (at school level) will take care of non-independence of students within a school; I also have a random slope on each independent variable. But for example, since probability of attending college can't truly be linear in test score, it also sounds like vce(cluster school) is appropriate here to help with mod mis-specification. Does that sound right?
Leave a comment:
Clyde Schechter replied

04 Mar 2021, 11:57
The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)
Leave a comment:
Jason Geng replied

04 Mar 2021, 10:12
Originally posted by Clyde Schechter View Post

What you are calling "the cluster command" is not that. It is simply the use of cluster robust standard errors with -regress-. The distinction is important because Stata does, in fact, have a -cluster- command and what it does is unrelated to the problem you are working with.

I would strongly prefer the use of the -mixed- model here. Yes it is, in a sense, a regular regression with adjustments made to the standard errors, but the adjustments are better than those provided by -vce(cluster ...)- when you really have hierarchical data. The -regress- approach, even with -vce(cluster ...)- does not adjust for potential confounding due to systematic differences among classes or schools. The -mixed- model does so.

The only circumstance where I would take -regress- over -mixed- is if the intraclass correlations at the school and Class levels are very close to zero. In that case, -mixed- is telling you that there isn't really any systematic effect of class or school on the outcome (at least conditional on behav* etc.) and in that case -regress- would be fine, and the results would be essentially indistinguishable.

As for standardized betas, even assuming that this is one of those unusual situations where using them with -regress- would actually make sense (which I question), they make no sense at all with hierarchical data. It isn't even clear what standardization means in the context of hierarchical data. What standard deviation should be used: that of the overall estimation sample? that within-class , calculated separately for each class? the pooled within-class one? that within-school, calculated separately for each school? the pooled within-school one? How would you explain or justify whichever choice you made? How would anybody go about using or interpreting the results obtained with any of these choices?

Hi Clyde,

Thank you for the detailed post. When you go with a -mixed- model, do you additionally recommend using vce(robust)? And does it make sense to additionally cluster the standard errors when using a mixed model? Or is that redundant given the explicit multi-level modeling in -mixed-? Thanks.

Jason
Leave a comment:

Announcement

Mixed effects (multilevel) model vs. cluster command

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: