multilevel model vs multigroup path analysis

jimin kim

Join Date: Apr 2024

Posts: 4
#1

multilevel model vs multigroup path analysis

09 Apr 2024, 12:08

Hello,

when the group level variable is only one binary variable, which is commonly used between multilevel modeling and multi-group path analysis?

dependent variable: academic performance
individual independent variable: ses
group level independent variable: school program (0,1)

my main interest is the effect of school program.

Last edited by jimin kim; 09 Apr 2024, 12:23.
Tags: None
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#2

09 Apr 2024, 12:26

It is almost impossible to justify doing multilevel modeling with a group level sample size of two. You have other options. One is multi-group path analysis using structural equation modeling (see help sem_group_options). Another option is to use single-level regression and build interactions between your binary group variable and the key predictors/covariates in the model. Note that with the multi-group SEM approach, most people allow all possible paths to vary by group. You can control this behavior in Stata's sem. Using single level regression gives you tighter control over which predictors/covariates you choose to interact with the grouping variable.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#3

09 Apr 2024, 12:30

Either approach should be reasonable, so I'd default to whichever technique you are most familiar with or is most widely used in your field. Others are welcome to correct me if I am wrong, but I don't think a small number of groups should be a problem for either approach. It's much more important you have a large number of observations within each group.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#4

09 Apr 2024, 12:32

Erik Ruzek is probably right since he knows multilevel models quite well. Can you say a bit more about why having a small number of groups is a problem?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#5

09 Apr 2024, 12:58

Happy to, Daniel Schaefer. A decent bit of methodological research has addressed this issue (here's a recent one). In general, the consensus is that you can get good parameter estimates for variance components with 10 or more groups, decent estimates when you have between 5-9 groups, and a hot mess when you have <=4. The variance components (random intercepts) are assumed to come from a normal distribution with mean 0 and a standard deviation estimated from your data and model. With effective N = 2, it's very hard to imagine that the standard deviation estimate from the model is anything close to reliable or generalizable. At the very least, it will be extremely noisy and you will get super wide confidence intervals for the parameter estimates. If you are willing to go Bayesian and have a strong prior for the variance(s), it is possible. See Gelman.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#6

09 Apr 2024, 13:14

Thanks for the cite Erik Ruzek! It is interesting that the recent citation you provide from Elff et al. argues, contrary to the (apparently) influential 2013 study by Stegmueller, that small numbers of clusters are not actually a problem, even with frequentist methods. Of course, it is only one paper and you do appear to be right that it contradicts something of a wider consensus about MLMs with a small number of clusters. From the conclusion:

A widely read and cited article by Stegmueller has raised serious concerns about the performance of standard likelihood-based methods of estimating multilevel models when the number of clusters is small. Stegmueller claims that these methods produce biased estimates of the coefficients of contextual variables, and that inferences about contextual effects may be strongly anti-conservative, potentially leading to an unjustified rejection of the null hypothesis of no effect. Especially with respect to statistical inference, several other studies have drawn similar conclusions (for example, Bryan and Jenkins Reference Bryan and Jenkins2016; Maas and Hox Reference Maas and Hox2005).

In this article we have demonstrated that this pessimistic assessment of likelihood-based estimators of coefficients in multilevel models cannot be upheld. First, analytical results from the statistical literature indicate that ML estimates of context effects in linear multilevel models are unbiased – irrespective of the number of clusters and irrespective of whether ML or REML estimation is used. For generalized linear multilevel models such as multilevel probit, biases are possible when the sizes of the clusters are small, resulting in small lower-level samples. However, small lower-level sample sizes are rare in the country-comparative setting that motivated Stegmueller's analysis. Consistent with these assertions, our re-analysis of his Monte Carlo experiments provides no evidence of biased parameter estimates for either linear or generalized linear multilevel models.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#7

09 Apr 2024, 14:12

Daniel Schaefer It is worth noting that in the article I cited in #2, the focus was on point estimates and standard errors of non-varying ("fixed effect") predictors measured at the group level. That is one concern, and the results there suggest that with appropriate methods (REML, Kenward-Roger correction, etc.), you can generally trust that you are getting good estimates. What is much more problematic with a small number of unique clusters is parameter estimates for variance components ("random effects'). This is often of interest in multilevel models and even among econometricians for estimating intraclass correlation coefficients (ICCs).
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#8

09 Apr 2024, 14:56

My intuition definitely tells me if this is going to be a problem it will be with the random effects part of the model - I don't think for variance reasons, but possibly for degrees of freedom reasons... It's a little less clear to me why it should matter for the ICCs per se, but I'll do a little more reading on the topic and I bet it will come to me. Thanks again Erik.

Last edited by Daniel Schaefer; 09 Apr 2024, 15:18.
Comment

Erik Ruzek

Join Date: Oct 2017
Posts: 398

09 Apr 2024, 15:51

When you only have two clusters, even if you have 1,000 cases within each cluster, the effective sample size for estimation of random effects (whether a variance or standard deviation) is two. Here's a little simulation where I generate data for 30 clusters, each having 15 cases. I run the multilevel model and get variance parameter estimates close enough to the DGP. If you want to get closer, simulate a larger group size.

Code:

clear*
version 16.1
set seed 113054
set obs 30                 // N of L2 units
gen class = _n
gen u_i = rnormal(0,3)    // random intercept w/ associated variability (SD=3; variance=9)

*Expand creates new observations (students) for each unique row
expand 15
bysort class: gen stuid = _n

gen e_ij = rnormal(0,5)     // residual w/ mean 0, SD=5 (variance=25)
gen y = 70 + u_i + e_ij     // outcome = 70 + residual error

mixed y || class:, reml 
estat icc

With the following results:

Code:

. mixed y || class: , reml 

Mixed-effects REML regression                   Number of obs     =        450
Group variable: class                           Number of groups  =         30

                                                Obs per group:
                                                              min =         15
                                                              avg =       15.0
                                                              max =         15

                                                Wald chi2(0)      =          .
Log restricted-likelihood = -1376.6391          Prob > chi2       =          .

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   70.03497   .5544249   126.32   0.000     68.94831    71.12162
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
class: Identity              |
                  var(_cons) |   7.639711   2.424172      4.101857    14.22897
-----------------------------+------------------------------------------------
               var(Residual) |   23.72847    1.63742      20.72675    27.16491
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 70.78         Prob >= chibar2 = 0.0000


. estat icc

Intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                       class |   .2435497   .0603827      .1448353    .3796733
------------------------------------------------------------------------------

Then I randomly sample two of clusters and estimate the same model. I am using Ben Jann's gsample here:

Code:

*Sample observations from just 2 classes
gsample 2, wor cluster(class)
mixed y || class: , reml
estat icc

These sure look different! You can see that both the variance estimates and the ICC are impacted. Anything that affects the variance is going to also impact the ICC because it is just the ratio of between group variance to total variance:

Code:

. mixed y || class: , reml  
Mixed-effects REML regression                   Number of obs     =         30
Group variable: class                           Number of groups  =          2

                                                Obs per group:
                                                              min =         15
                                                              avg =       15.0
                                                              max =         15

                                                Wald chi2(0)      =          .
Log restricted-likelihood =  -92.08708          Prob > chi2       =          .

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   70.88952   1.548982    45.77   0.000     67.85357    73.92547
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
class: Identity              |
                  var(_cons) |   2.871296   6.805901      .0275711    299.0208
-----------------------------+------------------------------------------------
               var(Residual) |   28.91091   7.726765      17.12256    48.81519
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 0.54          Prob >= chibar2 = 0.2311

. estat icc 

Intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                       class |   .0903429    .197675      .0008896    .9172051
------------------------------------------------------------------------------

Comment

Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#10

09 Apr 2024, 16:58

Okay, you are starting to lose me here.

When you only have two clusters, even if you have 1,000 cases within each cluster, the effective sample size for estimation of random effects (whether a variance or standard deviation) is two.

I'm fairly sure that is not true. Random effects are calculated using both the within and between group variances. Random effects hold if and only if the within-group size of an effect differs between groups, and that is clearly a property of both the within and between group variance. All of the observed information is used to calculate the random effects.

I also don't find the simulation very compelling. As you say, the ICC depends on the within, between, and total variances, and you've changed the variances by effectively dropping most of the data. I don't understand why one might expect the ICC would be the same in the first place. I also don't see how this demonstrates that the ICC or variances are incorrect for the 2 group model, which is estimated on different data. I'm struggling to understand how the simulated situation is analogous to one where there are exactly 2 theoretically meaningful classes.
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#11

09 Apr 2024, 18:00

Sorry, I should have been more clear in that statement. In particular, the between group variance is severely impacted by having just two groups. So when you think about how the model is estimating the between groups variance in the outcome, the sample size is the number of groups. For the within groups (residual) variance, it is using the total number of cases/observations.

This single simulation shows that the between group variance estimate not only changes in point value w/ only 2 clusters, but it becomes way more uncertain (notice the confidence intervals). It's also interesting that the LR test of whether the clustering is severe enough to warrant a mixed effect model is no longer significant, suggesting OLS would work here. But we know the DGP was based on a two-level model.

The within group variance also changes, is less certain, but is in the ballpark. I agree that a better simulation would start with the two group case and then look at how altering the within group sample size impacts these variance estimates. I started playing with that but needed to sign off for the evening.

Thanks for engaging with me on this, Daniel Schaefer!
Comment
jimin kim

Join Date: Apr 2024

Posts: 4
#12

09 Apr 2024, 20:17

Thank you all for discussing.
But I guess I explained it to you in a confusing way. 2level sample size is 178 so this is not a problem.
What I'm curious about is more related to the characteristics of 2-level variable. If HLM is applied when the 2-level variable characteristic is binary, wouldn't it be the same as regression analysis of two groups? (Sorry for such a basic question.)

My current research topic is whether the impact of student SES on academic achievement is different in schools with and without specific program. I can conduct research using multi-group path analysis, but I've been taking HLM lectures since last week, so I thought I'd try applying HLM for practice. However, I don’t know if this is an appropriate HLM model. Can someone please tell me if I can apply HLM when the 2level independent variable is binary?
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#13

09 Apr 2024, 22:41

I'm not aware of any reason that you cannot use a binary level 2 predictor in an HLM. It should be okay to do that.

Keep in mind that HLM can be very sensitive to small within-cluster sample sizes. That's not to say people don't work with small within cluster samples, but such models have a number of issues you should make yourself aware of if that is an issue for your data.
1 like
Comment

Announcement