Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How estimate margins after gsem with a latent variable

    Hi,

    Using gsem in Stata 13.1, I have built a single-level generalized structural equation model to fit data on tooth health in 145 dogs. Two endogenous variables are tooth calculus (calc: ordinal 0-1-2) and tooth loss (loss: binary 0-1). The model also contains a measurement part with a latent variable, which might be a complicating factor. I have managed to caculate predicted probabilities of the different levels of these variables for all observations, using e.g.

    > predict prcalc*, pr outcome(calc)
    > predict prloss*, pr outcome(loss)

    However, I have not succeeded to estimate marginal means (margins). So far, the commands I have tried either do not work at all or produce an error message that estimates cannot be obtained. I have plotted the probabilities against exogenous predictors (categorical and continuous) to illustrate the relationships graphically, using colours to distinguish different categories of one predictor (and I have even drawn trend lines using a spline function), but I figure it should be possible for Stata to calculate margins and plot them, especially since it does provide the probabilites.

    Can somebody please advise me on what command to issue and how to formulate it to get the margins?

    Cheers,
    Jan

  • #2
    Predictions after gsem models with a latent variable can be computed
    using one of three methods. These methods control what is used in place of
    the latent variables for computing the prediction.

    1. Use empirical Bayes' (EB) mean estimates.
    2. Use empirical Bayes' (EB) mode estimates.
    3. Set the latent variables to their expected values according to the
    specified model. This usually means setting them to zero.

    Here is an example using the auto data:

    Code:
    sysuse auto
    
    gsem    (foreign <- turn trunk L, probit)       ///
            (rep78 <- turn trunk L, oprobit)        ///
            (displ gear <- turn trunk L)
    
    * option -mean- is the default, using EB means for L
    predict pr_mean, pr outcome(foreign)
    
    * option -mode- specifies EB modes for L
    predict pr_mode, pr outcome(foreign) mode
    
    * option -fixedonly- specifies zero values for L
    predict pr_fixedonly, pr outcome(foreign) fixedonly
    The EB mean and mode calculations require the original data in order to be
    computed, thus these methods cannot be used with margins except for
    computing the grand marginal mean.

    The fixedonly predictions can be used with margins, but you must
    acknowledge that these predictions are calculated by plugging in zeroes for
    the latent variable.

    A fourth option would be to precompute the EB mean or mode estimates and plug
    them into a prediction expression using margins' expression()
    option. I'm would not advocate this option because it treats the predicted
    values of the latent variable as fixed. Nonetheless here is an example of how
    this can be done.

    Code:
    predict EB_mean, latent
    
    margins , expression(normal(predict(eta outcome(foreign) fixedonly) + _b[foreign:L]*EB_mean))
    A final option would be to compute the predictions marginally with respect to
    the latent variable; that is, effectively integrating the latent variable out
    of the prediction expression. Unfortunately I am not aware of a closed form
    expression for the predicted probability of a bernoulli or ordinal outcome
    that is marginal with respect to a Gaussian latent variable. This "marginal"
    prediction would have to be computed using a numerical integration technique.

    Comment


    • #3
      Thanks a lot Jeff, that clarifies some things. However, it is a bit more complicated. I have used the Stata nlsw88 dataset, built a model which resembles what I have and tried some solutions, following your suggested code. I do not grasp the logic behind the code, so my attempts are really clumsy.

      I guess one challenge is to get margins of all levels of an ordinal outcome (specifying the outcome and the level simultaneously), another one to get margins of an outcome which is not directly linked to the latent variable.

      Further advice is most welcome!

      Cheers,
      Jan

      *** Large dataset needed
      sysuse nlsw88.dta, replace

      *** Create ordinal and binary variables, becuase there are none
      egen ohours = cut(hours), group(3)
      egen bwage = cut(wage), group(2)

      gsem (smsa married <- L2, logit) ///
      (L2 <- age) ///
      (ohours <- L2, ologit) ///
      (ohours <- age) ///
      (bwage <- age) ///
      (bwage <- 2.ohours, logit)

      *** I want to obtain margins of both ohours (each level) and bwage
      *** Margins based on expected values of L2 no good option for bwage margins,
      *** because there will be no variation at all

      *** Predictions of ohours (six new variables)
      predict prh_mean*, pr outcome(ohours)
      predict prh_fixedonly*, pr outcome(ohours) fixedonly

      *** Predictions of bwage (two new variables):
      predict prh_mean, pr outcome(bwage)
      predict prw_fixedonly, pr outcome(bwage) fixedonly
      // Why are EB mean and fixedonly means identical?

      *** EB mean estimates (one new variable):
      predict EBw_mean, latent
      // Why is there no need to specify any variable name? Which mean is this?

      *** Attempts to obtain margins of each level of ohours, by analogy
      *** Using expected (zero) L2 values
      margins, expression(normal(predict(eta outcome(ohours) fixedonly)))
      margins, at(age=(34(2)46)) ///
      expression(normal(predict(eta outcome(ohours) fixedonly)))
      // Seems to work, but ignores the route age -> L2 -> ohours?
      *** Using precalculated EB mean estimates
      margins, expression(normal(predict(eta outcome(ohours) fixedonly) + ///
      _b[ohours:L2]*EBw_mean))
      margins, at(age=(34(2)46)) ///
      expression(normal(predict(eta outcome(ohours) fixedonly) + ///
      _b[ohours:L2]*EBw_mean))
      // Seems to work? Is this any different from the expected values option?
      // Which level of ohours is estimated, only one probability is produced?
      // How write the command to get margins for each level of ohours?

      *** Attempts to obtain margins of bwage, by analogy
      *** Using expected (zero) L2 values
      margins, expression(normal(predict(eta outcome(bwage) fixedonly)))
      // Error message "expression... stochastic quantities other than e(b)"
      *** Using precalculated EB mean estimates
      margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
      _b[bwage:L2]*EBw_mean))
      // Error message "[bwage:L2] not found" (no such path in the model)
      margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
      _b[bwage:2.ohours]*EBw_mean))
      // Error message "expression... stochastic quantities other than e(b)"
      margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
      _b[ohours:L2]*EBw_mean))
      // Error message "expression... stochastic quantities other than e(b)"
      // Can the command be written differently to get the margins?

      Comment


      • #4
        I'm going to trace through your example, and try to answer your questions
        along the way.

        Here is your data/model setup.

        Code:
        * data for the example
        sysuse nlsw88
        
        * some outcome variables for the example
        egen ohours = cut(hours), group(3)
        egen bwage = cut(wage), group(2)
        
        * the model
        gsem    (smsa married <- L2, logit)     ///
                (L2 <- age)                     ///
                (ohours <- L2, ologit)          ///
                (ohours <- age)                 ///
                (bwage <- age)                  ///
                (bwage <- 2.ohours, logit)
        You then comment the Stata code with (edited):

        I want to obtain margins of both ohours (each level) and bwage.

        Margins based on expected values of L2 no good option for bwage margins,
        because there will be no variation at all.
        ohours is a measure of the latent variable L2, so the predicted
        outcome probabilities differ between the default predictions using EB means
        and the fixedonly ones.

        Code:
        *** Predictions of ohours (six new variables)
        predict prh_mean*, pr outcome(ohours)
        predict prh_fixedonly*, pr outcome(ohours) fixedonly
        bwage is not a measure of L2 so the following predicted outcome
        probabilities are the same.

        Code:
        *** Predictions of bwage (two new variables):
        predict prh_mean, pr outcome(bwage)
        predict prw_fixedonly, pr outcome(bwage) fixedonly
        You then predict the EB means of L2

        Code:
        *** EB mean estimates (one new variable):
        predict EBw_mean, latent
        and ask

        Why is there no need to specify any variable name? Which mean is this?
        The predicted values are for L2, they are the EB mean estimates for
        this latent variable. The previous call to predict can be more
        explicitly specified as

        Code:
        predict EBw_mean, latent(L2) mean
        Now you

        Attempt to obtain margins of each level of ohours, by analogy
        Using expected (zero) L2 values
        Code:
        margins, expression(normal(predict(eta outcome(ohours) fixedonly)))
        margins, at(age=(34(2)46)) ///
                 expression(normal(predict(eta outcome(ohours) fixedonly)))
        but notice that

        Seems to work, but ignores the route age -> L2 -> ohours?
        It is true that the path age -> L2 -> ohours is ignored, but there are
        a few problems with your expression:
        1. You fit ohours using ologit which is a shortcut for
          family(ordinal) and link(logit), but use the cdf for the normal
          distribution in the expression. I take some responsibility for that, I
          assumed you were using the probit link when I suggested those
          expressions. I will change the expressions to use invlogit() which is
          the correct function to use with the ordinal logistic model in our case.
        2. The predicted probability for an ordinal outcome is computed by
          differencing the cummulative probabilities between adjacent cutpoints,
          recognizing that cut0 is minus infinity and cut3 is plus infinity in this
          example.

        Consider the marginal predicted probability for each level of ohours.

        Code:
        margins, predict(pr outcome(0.ohours) fixedonly)
        margins, predict(pr outcome(1.ohours) fixedonly)
        margins, predict(pr outcome(2.ohours) fixedonly)
        Here are the corresponding margins commands to reproduce the above
        using the expression() option.

        Code:
        * margins, predict(pr outcome(0.ohours) fixedonly)
        margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly)))
        
        * margins, predict(pr outcome(1.ohours) fixedonly)
        margins, expression( ///
                 invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly)) - ///
                 invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly)))
        
        * margins, predict(pr outcome(2.ohours) fixedonly)
        margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly)))
        Now fixedonly really does treat the latent variables as zero, even if
        they are predicted by other observed variables, but we can account for
        ohours indirect dependence on age though L2 in our
        expression.

        Code:
        margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age))
        
        margins, expression( ///
                 invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age) - ///
                 invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age))
        
        margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly) + _b[ohours:L2]*_b[L2:age]*age))
        Now if you insist on using the EB mean predictions of L2 as fixed
        predictors for computing the predictive margins of the outcome probabilities
        of ohours, the expressions become

        Code:
        margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean))
        
        margins, expression( ///
                 invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean) - ///
                 invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean))
        
        margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly) + _b[ohours:L2]*_b[L2:age]*age + _b[ohours:L2]*EBw_mean))
        I hope this answers your following questions:

        Is this any different from the expected values option?
        Which level of ohours is estimated, only one probability is produced?
        How write the command to get margins for each level of ohours?
        You then try to obtain margins for the bwage outcome. There are two
        things to note:
        1. The expressions need to be updated to use invlogit() instead of
          normal().
        2. bwage is not predicted by L2, so I see no need to even bother
          with the expressions, just use the fixedonly probability prediction
          with margins.
        3. bwage uses 2.ohours as a predictor, thus ohours is
          effectively an endogenous predictor. margins doesn't like this and
          will report the following error:

          prediction is a function of possibly stochastic quantities other than e(b)

          You can force margins to go forward with this calculation
          by using the force option.

        Code:
        margins, predict(pr outcome(bwage) fixedonly) force

        Comment


        • #5
          Jeff, this is very helpful. I am extremely grateful and deeply impressed by your fast and complete reply. Thanks so much!

          /Jan

          Comment

          Working...
          X