Saturated model for GSEM

Rasmus Waagepetersen

Join Date: Mar 2025

Posts: 3
#1

Saturated model for GSEM

17 Mar 2025, 14:02

Using GSEM I have fitted a factor analysis model for discrete responses using ordinal logistic regression.

With the command estat lcgof it should be possible to obtain a test for the fitted model against the saturated model.

However, it is not clear to me, what is the definition of the saturated model in case of GSEM. I wonder if anyone can help ?
Tags: None

1 like
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#2

19 Mar 2025, 14:57

I think you are in a bit of a bind, here. In a fully linear (Gaussian) SEM context, most folks think of the saturated model as one that estimates all means and variances of items and their covariances. This model has perfect fit and you get a log likelihood from it. The difference between the log likelihoods of a saturated model and your theoretically-informed model can be examined using a Chi-square test. Stata's sem does this for you, reporting the test at the bottom of the output.

In theory, you should be able to do the same thing in GSEM, however, Stata's gsem does not allow you to estimate covariances between the residuals of non-Gaussian items. What you get out of gsem is the intercepts and thresholds. If you were to use another program, such as Mplus, you can get covariances between dichotomous and ordinal items. This would allow you to get the log likelihood for a saturated model. To do that in Stata, you'd probably have to code something up using bayesmh, but that has the potential to introduce some non-trivial complexity if you are not familiar with Bayesian methods.
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#3

20 Mar 2025, 09:23

For LCA models with categorical outcomes, the saturated model is one where the outcome cell probabilities are estimated without constraint. One possible way to compute the saturated model's log-likelihood (LL) is to generate a variable that identifies the cells from the categorical outcome variables, then use gsem (or ologit or mlogit or ...) to fit the cell probabilities. Here is an example based on [SEM] Example 50g.

Code:

webuse gsem_lca1 * saturated model egen double all = group(accident play insurance stock) gsem (all <- ), ologit est store sat local ll_sat = e(ll) local df_sat = e(rank) * fitted model gsem (accident play insurance stock <- ), ologit lclass(C 2) est store fit local ll_fit = e(ll) local df_fit = e(rank) estat lcgof di 2*(`ll_sat' - `ll_fit') di `df_sat' - `df_fit'

As the number of outcomes and their levels increases, this method gets unwieldy very quickly. gsem actually computes the saturated LL for estat lcgof by computing the cell probabilities directly. Here is the equivalent "direct" code you can use to reproduce the values reported by estat lcgof.

Code:

* compute saturated LL from cell probabilities directly gen touse = e(sample) local by accident play insurance stock touse sort `by' by `by' : gen last = _n == _N if touse by `by' : gen double g2 = sum(1) if touse // 1 for [weights] replace g2 = g2*log(g2/e(N)) if last == 1 sum g2 if last == 1 & touse, meanonly local ll_sat = r(sum) count if last & touse local df_sat = r(N) - 1 di 2*(`ll_sat' - `ll_fit') di `df_sat' - `df_fit'
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#4

20 Mar 2025, 09:53

Jeff Pitblado (StataCorp) Is the saturated model for an LCA the same as a saturated model from a CFA, which the OP noted is the kind of model they are running?
Comment
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 700
#5

20 Mar 2025, 14:08

Sorry, I saw "ordinal logistic regression" and estat lcgof and jumped straight to latent class analysis (LCA), (unintentionally) ignoring the mention of "factor analysis". Now I see how Erik's original response addresses the original question.

I agree with Erik, Stata does not implement have any methods that estimate a log-likelihood corresponding to a saturated model for ordinal (categorical) outcomes in this scenario.

I do not know if the saturated model used in LCA is applicable in the confirmatory factor analysis (CFA) framework. I suspect not.

That said, after a little searching:

I found a 2019 paper at the NSF website titled Assessing Fit in Ordinal Factor Analysis Models: SRMR vs. RMSEA.

I also found the UCLA website has an example of performing factor analysis with categorical variables.

Stata does not have an official command that estimates polychoric correlations, as mentioned in the NSF paper, but the UCLA example mentions Stas Kolenikov's polychoric command. In Stata, type

Code:

search polychoric

to find and install this command if you want to experiment with fitting models using polychoric correlations.

Here is an example based on [SEM] Example 35g that uses Stas Kolenikov's polychoric command to produce polychoric correlations that are store as summary statistics data (SSD) for sem to fit a CFA model.

Code:

webuse gsem_issp93 * CFA model fit using the observed ordinal outcomes gsem (y1 y2 y3 y4 <- SciAtt), oprobit est store observed * CFA model fit using polychoric correlations * estimate polychoric correlations polychoric y1 y2 y3 y4 local N = r(N) matrix R = r(R) matrix list R frame create ssd frame ssd { * create SSD frame using sample size and polychoric correlations ssd init y1 y2 y3 y4 ssd set obs `N' ssd set correlations (stata) R * CFA model sem (y1 y2 y3 y4 <- SciAtt) est store polychoric * Goodness-of-fit statistics estat gof, stats(chi2 rmsea residuals) } * compare fitted loadings etable, est(observed polychoric) column(estimates) equations(y1 y2 y3 y4)
1 like
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 429
#6

20 Mar 2025, 15:39

That's awesome, Jeff Pitblado (StataCorp)! Thanks for working that out. I didn't think of using the polychorics in that way. It's how we use them to do PCA and EFA for structural validity analysis.
Comment
Rasmus Waagepetersen

Join Date: Mar 2025

Posts: 3
#7

26 Mar 2025, 03:30

Dear Erik and Jeff,

Many thanks for your helpful comments.

So far I have done a simple goodness-of-fit assessment by computing the means and covariances of the responses as implied by the fitted GSEM model (simply by Monte Carlo using simulations of the fitted model). These I can compare graphically with the empirical means and variances obtained directly from the observed responses (i.e. the "saturated" means and covariances). I can also obtain a Monte Carlo p-value based on the discrepancies between model based and empirical response covariances.

I am not so familiar with the concept of polychoric correlation and will look further into that.

Best regards,

Rasmus
Comment
Rasmus Waagepetersen

Join Date: Mar 2025

Posts: 3
#8

26 Mar 2025, 04:17

PS: have looked a bit into polychoric correlation. If I understand it correctly, the polychoric correlation matrix is the correlation matrix of the latent continuous variables that are thresholded to get the responses of an ordinal logistic (or probit) model. In that case I guess one could specify a saturated model in GSEM as the model where to each response there is associated a unique latent factor. But computationally, I suspect this would be a nightmare (unless the number of responses is very small).
Comment

Announcement

Saturated model for GSEM

Comment

Comment

Comment

Comment

Comment

Comment

Comment