Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed effects (multilevel) model vs. cluster command

    Hi all
    I want to test the association between childhood behaviour at age 6 years and earnings at age 36 (n=1000). The behavioural assessments were obtained from teachers when the children were aged 6. I want to control for clustering in the behavioural assessments at the school and classroom levels, although I’m not testing predictors at those levels. From what I understand, the mixed model is better although I would only report the fixed effects estimates, not the random effects, which seems amounts to a regular regression (with adjusted SE estimates). What are the pros and cons of using a mixed effects model vs. the cluster command? The advantage of the cluster command is simplicity and that I can still report standardised betas. Any suggestions welcome. See examples below – they produce similar results.
    Many thanks

    mixed OUTCOME behav1 behav2 etc || school: || Class:, vce(robust)

    egen double_cluster=group (school class)
    regress OUTCOME behav1 behav2 etc, vce(cluster double_cluster) robust

  • #2
    What you are calling "the cluster command" is not that. It is simply the use of cluster robust standard errors with -regress-. The distinction is important because Stata does, in fact, have a -cluster- command and what it does is unrelated to the problem you are working with.

    I would strongly prefer the use of the -mixed- model here. Yes it is, in a sense, a regular regression with adjustments made to the standard errors, but the adjustments are better than those provided by -vce(cluster ...)- when you really have hierarchical data. The -regress- approach, even with -vce(cluster ...)- does not adjust for potential confounding due to systematic differences among classes or schools. The -mixed- model does so.

    The only circumstance where I would take -regress- over -mixed- is if the intraclass correlations at the school and Class levels are very close to zero. In that case, -mixed- is telling you that there isn't really any systematic effect of class or school on the outcome (at least conditional on behav* etc.) and in that case -regress- would be fine, and the results would be essentially indistinguishable.

    As for standardized betas, even assuming that this is one of those unusual situations where using them with -regress- would actually make sense (which I question), they make no sense at all with hierarchical data. It isn't even clear what standardization means in the context of hierarchical data. What standard deviation should be used: that of the overall estimation sample? that within-class , calculated separately for each class? the pooled within-class one? that within-school, calculated separately for each school? the pooled within-school one? How would you explain or justify whichever choice you made? How would anybody go about using or interpreting the results obtained with any of these choices?

    Comment


    • #3
      In addition to what Clyde said, I have some minor points.

      1) The sandwich estimator of the variance is robust to violations of independence caused by clustering. (i.e. the -vce(cluster clustervar)- option; the -vce(robust)- option is robust to violation of heteroskedasticity and is similar but not the same). I believe that estimator relies on having a large number of clusters to achieve its goals (I hope someone will correct me if this is wrong). If you have few clusters, it won't work as well, and it might be better to explicitly model that.

      2) The original post alludes to two levels of clustering - classes, which are nested in schools. That is a situation where you'd default to -mixed-.

      3) With -mixed-, you can explicitly model the proportion of variance that's attributable to within-cluster variation, and between-cluster variation. Often, this is of substantive interest.

      4) Another option to be aware of is -xtreg, fe-, which uses fixed effects for the clusters. However, it only handles one level of clustering. Economists tend to prefer fixed effects, arguing that they provide unbiased estimates of the coefficients. Other disciplines are not as concerned about this. I mention it for completeness. Seeing as you have two levels of clustering, this won't be a perfect fit for your purposes.

      5) The correct term is the -vce(cluster ...)- option. There is a separate set of cluster analysis commands, which do something very different.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thank you for these clear and detailed responses.

        One reason for my question is that I want to apply the above model to a categorical outcome with 3 levels (i.e. multinomial logistic regression/mlogit), but from what I’ve read, Stata doesn’t have a dedicated command for this and it can only be done using the gsem command. Also, to further complicate things, I need to do this within a multiple imputations framework, which I’ve read does not work for sem in Stata.

        The model I want to run is this: mi estimate: ?command 3_level_outcome pred1 pred2 etc || school: || class:

        What options do I have? Are there alternative to gsem for this kind of problem?

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          What you are calling "the cluster command" is not that. It is simply the use of cluster robust standard errors with -regress-. The distinction is important because Stata does, in fact, have a -cluster- command and what it does is unrelated to the problem you are working with.

          I would strongly prefer the use of the -mixed- model here. Yes it is, in a sense, a regular regression with adjustments made to the standard errors, but the adjustments are better than those provided by -vce(cluster ...)- when you really have hierarchical data. The -regress- approach, even with -vce(cluster ...)- does not adjust for potential confounding due to systematic differences among classes or schools. The -mixed- model does so.

          The only circumstance where I would take -regress- over -mixed- is if the intraclass correlations at the school and Class levels are very close to zero. In that case, -mixed- is telling you that there isn't really any systematic effect of class or school on the outcome (at least conditional on behav* etc.) and in that case -regress- would be fine, and the results would be essentially indistinguishable.

          As for standardized betas, even assuming that this is one of those unusual situations where using them with -regress- would actually make sense (which I question), they make no sense at all with hierarchical data. It isn't even clear what standardization means in the context of hierarchical data. What standard deviation should be used: that of the overall estimation sample? that within-class , calculated separately for each class? the pooled within-class one? that within-school, calculated separately for each school? the pooled within-school one? How would you explain or justify whichever choice you made? How would anybody go about using or interpreting the results obtained with any of these choices?
          Hi Clyde,

          Thank you for the detailed post. When you go with a -mixed- model, do you additionally recommend using vce(robust)? And does it make sense to additionally cluster the standard errors when using a mixed model? Or is that redundant given the explicit multi-level modeling in -mixed-? Thanks.

          Jason

          Comment


          • #6
            The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)
              Thank you for the response. I'm working with a mixed model where the dependent variable is binary (student goes to college or not), and the 2 key independent variables are # of AP courses completed and standardized test scores, and each of 500 high schools is a cluster. It sounds like the random intercept (at school level) will take care of non-independence of students within a school; I also have a random slope on each independent variable. But for example, since probability of attending college can't truly be linear in test score, it also sounds like vce(cluster school) is appropriate here to help with mod mis-specification. Does that sound right?

              Comment


              • #8
                Yes, I would agree. Moreover, with a linear probability model for a dichotomous outcome, heteroscedasticity is virtually guaranteed.

                Comment


                • #9
                  Thanks again Clyde. So, is there ever a situation in which you wouldn’t want to use vce(cluster whatever) — no matter if you are using reg, mixed, melogit, etc?

                  Comment


                  • #10
                    In situations where you are satisfied that heteroscedasticity and model misspecification are minimal or non-existent, and where observations are independent, or where dependence is fully accounted for with random intercepts, there is no need for vce(cluster). Also, bear in mind that cluster robust standard errors are asymptotically correct. When the number of clusters is small, these can actually be worse than the unclustered standard errors. There is no consensus about how small is small, but I myself would never use vce(cluster) with fewer than 15 clusters, and I might even require a larger number in some circumstances.

                    Comment


                    • #11
                      Hello Clyde and others - thank you, i found this thread very useful but have a couple of follow up questions...

                      I'm using Stata v17. Fitting an nbreg model.

                      Outcome is a count of hospital visits. Main exposure of interest is categorical ethnicity. Several covariates, including local authority (based on area of residence: groups: ~300, min n of cases per group: ~20, max n of cases per group:~10,000). Overall large sample size >500,000

                      I think its reasonable to assume that there may be some intragroup correlation at local authority level and heteroskedasticity. So, im trying to decide a) how to test for both intragroup correlation and heteroskedasticity and b) how to handle it if it exists. But, the go to post estimation commands
                      Code:
                      estat icc
                      and
                      Code:
                      estat hettest
                      i think are not available with nbreg, so im not sure of the best way to proceed and hoping to get some advice please.

                      So far i have run a mixed model and obtained the icc using:
                      Code:
                      mixed y i.x1 x2 || group:
                      estat icc
                      the icc was <.02

                      Question 1: based on this low icc is it reasonable to assume that i don't need a multilevel analysis and that i don't need to adjust for clustering via vce(cluster)? Or is there a better way to evaluate icc for my overdispersed count outcome?

                      Question 2: how can i evaluate heteroskedasticity after nbreg (or am i completely on the wrong track here?)

                      Assuming a mixed model isnt necessary, I can see 3 different options for my model and have tried them out with the data - I found very little difference between the models in terms of coef and se.

                      option 1
                      Code:
                      nbreg y i.x1 x2 i.group
                      option 2
                      Code:
                      nbreg y i.x1 x2 i.group, vce(robust)
                      option 3
                      Code:
                      nbreg y i.x1 x2, vce(cluster group)
                      Question 3: My feeling is that option 2 would be best (accounting for potential violations and based on the low icc above, vce(cluster) isnt needed) - but is there a good way to make a comparison between the models? And is including i.group in the model like this acceptable?

                      Grateful for any advice.

                      Thank you,
                      Joanna





                      Comment


                      • #12
                        Thank you all. the above thread was very useful. I just have one more question. I want to run a regression where the dependent variable is at patient level. The independent variables are both at patient level and hospital level. I am particularly interested in the hospital level variables. If I use mixed effects model, will it eliminate the systematic differences between hospitals? What is the best approach to use here?

                        Comment


                        • #13
                          No, it will not eliminate the systematic differences between hospitals. Those will be captured by the hospital level variables (as well as differences in the distributions of patient level variables across hospitals) to the extent they are measured. The unmeasured differences between hospitals will be picked up by the random intercepts. If you also believe that the effects of some of the patient level variables differ across hospitals, then you can add random slopes for those variables at the hospital level of the model to capture that.

                          Comment


                          • #14
                            Thanks Clyde! That was very helpful!

                            Comment


                            • #15
                              Thank you all for the this post.
                              I would like pose a related question in an another scenario.
                              Consider to have a measure of function collected on each eye of a subject and wish to assess its association with the presence/absence of a given disease (a dichotomous variable) taking into account the confounding of age ( a continous variale).
                              A classical regression of the measure with disease and age would suffer from the fact that there is an obvious correlation between the 2 eyes of each subject and, therefore, there is a violation of the independece of the y assumption in the model.
                              The simplest approach (for me) would be make a regression analysis including the vce (cluster id ) option where id refers to the label of each subject.
                              Code:

                              regress measure i.disease age, vce (cluster id)

                              However it should be appropriate to build a mixed model as
                              code:

                              mixed measure i.disease age || id:

                              Both models give same coefiicient values and very close se on the data that I have at hand.

                              From the post I understand that the mixed approach has to be preferred.
                              If so why and how big can be the error. Specifically how big is the mistake to rely on R2 and its partition in the regress model evaluating the relative weigth of the independent variable in the association ?

                              Consider:
                              1- From a recent (2018) inquiry in the medical literature related to the eyes results that “Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96).”. Documenting a persisting lack of appropriate statistical analysis for ocular data.
                              2- It would be easier for a non-statistician understad regression models rather than the mixed model.


                              Thank a lot in advance

                              Comment

                              Working...
                              X