Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saturated model for GSEM

    Using GSEM I have fitted a factor analysis model for discrete responses using ordinal logistic regression.

    With the command estat lcgof it should be possible to obtain a test for the fitted model against the saturated model.

    However, it is not clear to me, what is the definition of the saturated model in case of GSEM. I wonder if anyone can help ?

  • #2
    I think you are in a bit of a bind, here. In a fully linear (Gaussian) SEM context, most folks think of the saturated model as one that estimates all means and variances of items and their covariances. This model has perfect fit and you get a log likelihood from it. The difference between the log likelihoods of a saturated model and your theoretically-informed model can be examined using a Chi-square test. Stata's sem does this for you, reporting the test at the bottom of the output.

    In theory, you should be able to do the same thing in GSEM, however, Stata's gsem does not allow you to estimate covariances between the residuals of non-Gaussian items. What you get out of gsem is the intercepts and thresholds. If you were to use another program, such as Mplus, you can get covariances between dichotomous and ordinal items. This would allow you to get the log likelihood for a saturated model. To do that in Stata, you'd probably have to code something up using bayesmh, but that has the potential to introduce some non-trivial complexity if you are not familiar with Bayesian methods.

    Comment


    • #3
      For LCA models with categorical outcomes, the saturated model is one where the outcome cell probabilities are estimated without constraint. One possible way to compute the saturated model's log-likelihood (LL) is to generate a variable that identifies the cells from the categorical outcome variables, then use gsem (or ologit or mlogit or ...) to fit the cell probabilities. Here is an example based on [SEM] Example 50g.
      Code:
      webuse gsem_lca1
      
      * saturated model
      egen double all = group(accident play insurance stock)
      gsem (all <- ), ologit
      est store sat
      local ll_sat = e(ll)
      local df_sat = e(rank)
      
      * fitted model
      gsem (accident play insurance stock <- ), ologit lclass(C 2)
      est store fit
      local ll_fit = e(ll)
      local df_fit = e(rank)
      
      estat lcgof
      
      di 2*(`ll_sat' - `ll_fit')
      di `df_sat' - `df_fit'
      As the number of outcomes and their levels increases, this method gets unwieldy very quickly. gsem actually computes the saturated LL for estat lcgof by computing the cell probabilities directly. Here is the equivalent "direct" code you can use to reproduce the values reported by estat lcgof.
      Code:
      * compute saturated LL from cell probabilities directly
      gen touse = e(sample)
      local by accident play insurance stock touse
      sort `by'
      by `by' : gen last = _n == _N if touse
      by `by' : gen double g2 = sum(1) if touse    // 1 for [weights]
      replace g2 = g2*log(g2/e(N)) if last == 1
      sum g2 if last == 1 & touse, meanonly
      local ll_sat = r(sum)
      count if last & touse
      local df_sat = r(N) - 1
      di 2*(`ll_sat' - `ll_fit')
      di `df_sat' - `df_fit'

      Comment


      • #4
        Jeff Pitblado (StataCorp) Is the saturated model for an LCA the same as a saturated model from a CFA, which the OP noted is the kind of model they are running?

        Comment


        • #5
          Sorry, I saw "ordinal logistic regression" and estat lcgof and jumped straight to latent class analysis (LCA), (unintentionally) ignoring the mention of "factor analysis". Now I see how Erik's original response addresses the original question.

          I agree with Erik, Stata does not implement have any methods that estimate a log-likelihood corresponding to a saturated model for ordinal (categorical) outcomes in this scenario.

          I do not know if the saturated model used in LCA is applicable in the confirmatory factor analysis (CFA) framework. I suspect not.

          That said, after a little searching:

          I found a 2019 paper at the NSF website titled Assessing Fit in Ordinal Factor Analysis Models: SRMR vs. RMSEA.

          I also found the UCLA website has an example of performing factor analysis with categorical variables.

          Stata does not have an official command that estimates polychoric correlations, as mentioned in the NSF paper, but the UCLA example mentions Stas Kolenikov's polychoric command. In Stata, type
          Code:
          search polychoric
          to find and install this command if you want to experiment with fitting models using polychoric correlations.

          Here is an example based on [SEM] Example 35g that uses Stas Kolenikov's polychoric command to produce polychoric correlations that are store as summary statistics data (SSD) for sem to fit a CFA model.
          Code:
          webuse gsem_issp93
          
          * CFA model fit using the observed ordinal outcomes
          gsem (y1 y2 y3 y4 <- SciAtt), oprobit
          est store observed
          
          * CFA model fit using polychoric correlations
          * estimate polychoric correlations
          polychoric y1 y2 y3 y4
          local N = r(N)
          matrix R = r(R)
          matrix list R
          frame create ssd
          frame ssd {
              * create SSD frame using sample size and polychoric correlations
              ssd init y1 y2 y3 y4
              ssd set obs `N'
              ssd set correlations (stata) R
              * CFA model
              sem (y1 y2 y3 y4 <- SciAtt)
              est store polychoric
              * Goodness-of-fit statistics
              estat gof, stats(chi2 rmsea residuals)
          }
          
          * compare fitted loadings
          etable, est(observed polychoric) column(estimates) equations(y1 y2 y3 y4)

          Comment


          • #6
            That's awesome, Jeff Pitblado (StataCorp)! Thanks for working that out. I didn't think of using the polychorics in that way. It's how we use them to do PCA and EFA for structural validity analysis.

            Comment


            • #7
              Dear Erik and Jeff,

              Many thanks for your helpful comments.

              So far I have done a simple goodness-of-fit assessment by computing the means and covariances of the responses as implied by the fitted GSEM model (simply by Monte Carlo using simulations of the fitted model). These I can compare graphically with the empirical means and variances obtained directly from the observed responses (i.e. the "saturated" means and covariances). I can also obtain a Monte Carlo p-value based on the discrepancies between model based and empirical response covariances.

              I am not so familiar with the concept of polychoric correlation and will look further into that.

              Best regards,

              Rasmus

              Comment


              • #8
                PS: have looked a bit into polychoric correlation. If I understand it correctly, the polychoric correlation matrix is the correlation matrix of the latent continuous variables that are thresholded to get the responses of an ordinal logistic (or probit) model. In that case I guess one could specify a saturated model in GSEM as the model where to each response there is associated a unique latent factor. But computationally, I suspect this would be a nightmare (unless the number of responses is very small).

                Comment

                Working...
                X