How estimate margins after gsem with a latent variable

Jan Hultgren

Join Date: Jun 2014

Posts: 12
#1

How estimate margins after gsem with a latent variable

23 Jun 2014, 12:30

Hi,

Using gsem in Stata 13.1, I have built a single-level generalized structural equation model to fit data on tooth health in 145 dogs. Two endogenous variables are tooth calculus (calc: ordinal 0-1-2) and tooth loss (loss: binary 0-1). The model also contains a measurement part with a latent variable, which might be a complicating factor. I have managed to caculate predicted probabilities of the different levels of these variables for all observations, using e.g.

> predict prcalc*, pr outcome(calc)
> predict prloss*, pr outcome(loss)

However, I have not succeeded to estimate marginal means (margins). So far, the commands I have tried either do not work at all or produce an error message that estimates cannot be obtained. I have plotted the probabilities against exogenous predictors (categorical and continuous) to illustrate the relationships graphically, using colours to distinguish different categories of one predictor (and I have even drawn trend lines using a spline function), but I figure it should be possible for Stata to calculate margins and plot them, especially since it does provide the probabilites.

Can somebody please advise me on what command to issue and how to formulate it to get the margins?

Cheers,
Jan
Tags: None
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 667
#2

24 Jun 2014, 09:09

Predictions after gsem models with a latent variable can be computed
using one of three methods. These methods control what is used in place of
the latent variables for computing the prediction.

1. Use empirical Bayes' (EB) mean estimates.
2. Use empirical Bayes' (EB) mode estimates.
3. Set the latent variables to their expected values according to the
specified model. This usually means setting them to zero.

Here is an example using the auto data:

Code:

sysuse auto gsem (foreign <- turn trunk L, probit) /// (rep78 <- turn trunk L, oprobit) /// (displ gear <- turn trunk L) * option -mean- is the default, using EB means for L predict pr_mean, pr outcome(foreign) * option -mode- specifies EB modes for L predict pr_mode, pr outcome(foreign) mode * option -fixedonly- specifies zero values for L predict pr_fixedonly, pr outcome(foreign) fixedonly

The EB mean and mode calculations require the original data in order to be
computed, thus these methods cannot be used with margins except for
computing the grand marginal mean.

The fixedonly predictions can be used with margins, but you must
acknowledge that these predictions are calculated by plugging in zeroes for
the latent variable.

A fourth option would be to precompute the EB mean or mode estimates and plug
them into a prediction expression using margins' expression()
option. I'm would not advocate this option because it treats the predicted
values of the latent variable as fixed. Nonetheless here is an example of how
this can be done.

Code:

predict EB_mean, latent margins , expression(normal(predict(eta outcome(foreign) fixedonly) + _b[foreign:L]*EB_mean))

A final option would be to compute the predictions marginally with respect to
the latent variable; that is, effectively integrating the latent variable out
of the prediction expression. Unfortunately I am not aware of a closed form
expression for the predicted probability of a bernoulli or ordinal outcome
that is marginal with respect to a Gaussian latent variable. This "marginal"
prediction would have to be computed using a numerical integration technique.
1 like
Comment

�
Jan Hultgren

Join Date: Jun 2014

Posts: 12
#3

26 Jun 2014, 05:16

Thanks a lot Jeff, that clarifies some things. However, it is a bit more complicated. I have used the Stata nlsw88 dataset, built a model which resembles what I have and tried some solutions, following your suggested code. I do not grasp the logic behind the code, so my attempts are really clumsy.

I guess one challenge is to get margins of all levels of an ordinal outcome (specifying the outcome and the level simultaneously), another one to get margins of an outcome which is not directly linked to the latent variable.

Further advice is most welcome!

Cheers,
Jan

*** Large dataset needed
sysuse nlsw88.dta, replace

*** Create ordinal and binary variables, becuase there are none
egen ohours = cut(hours), group(3)
egen bwage = cut(wage), group(2)

gsem (smsa married <- L2, logit) ///
(L2 <- age) ///
(ohours <- L2, ologit) ///
(ohours <- age) ///
(bwage <- age) ///
(bwage <- 2.ohours, logit)

*** I want to obtain margins of both ohours (each level) and bwage
*** Margins based on expected values of L2 no good option for bwage margins,
*** because there will be no variation at all

*** Predictions of ohours (six new variables)
predict prh_mean*, pr outcome(ohours)
predict prh_fixedonly*, pr outcome(ohours) fixedonly

*** Predictions of bwage (two new variables):
predict prh_mean, pr outcome(bwage)
predict prw_fixedonly, pr outcome(bwage) fixedonly
// Why are EB mean and fixedonly means identical?

*** EB mean estimates (one new variable):
predict EBw_mean, latent
// Why is there no need to specify any variable name? Which mean is this?

*** Attempts to obtain margins of each level of ohours, by analogy
*** Using expected (zero) L2 values
margins, expression(normal(predict(eta outcome(ohours) fixedonly)))
margins, at(age=(34(2)46)) ///
expression(normal(predict(eta outcome(ohours) fixedonly)))
// Seems to work, but ignores the route age -> L2 -> ohours?
*** Using precalculated EB mean estimates
margins, expression(normal(predict(eta outcome(ohours) fixedonly) + ///
_b[ohours:L2]*EBw_mean))
margins, at(age=(34(2)46)) ///
expression(normal(predict(eta outcome(ohours) fixedonly) + ///
_b[ohours:L2]*EBw_mean))
// Seems to work? Is this any different from the expected values option?
// Which level of ohours is estimated, only one probability is produced?
// How write the command to get margins for each level of ohours?

*** Attempts to obtain margins of bwage, by analogy
*** Using expected (zero) L2 values
margins, expression(normal(predict(eta outcome(bwage) fixedonly)))
// Error message "expression... stochastic quantities other than e(b)"
*** Using precalculated EB mean estimates
margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
_b[bwage:L2]*EBw_mean))
// Error message "[bwage:L2] not found" (no such path in the model)
margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
_b[bwage:2.ohours]*EBw_mean))
// Error message "expression... stochastic quantities other than e(b)"
margins, expression(normal(predict(eta outcome(bwage) fixedonly) + ///
_b[ohours:L2]*EBw_mean))
// Error message "expression... stochastic quantities other than e(b)"
// Can the command be written differently to get the margins?
Comment

�
Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 667
#4

26 Jun 2014, 13:33

I'm going to trace through your example, and try to answer your questions
along the way.

Here is your data/model setup.

Code:

* data for the example sysuse nlsw88 * some outcome variables for the example egen ohours = cut(hours), group(3) egen bwage = cut(wage), group(2) * the model gsem (smsa married <- L2, logit) /// (L2 <- age) /// (ohours <- L2, ologit) /// (ohours <- age) /// (bwage <- age) /// (bwage <- 2.ohours, logit)

You then comment the Stata code with (edited):

I want to obtain margins of both ohours (each level) and bwage.

Margins based on expected values of L2 no good option for bwage margins,
because there will be no variation at all.

ohours is a measure of the latent variable L2, so the predicted
outcome probabilities differ between the default predictions using EB means
and the fixedonly ones.

Code:

*** Predictions of ohours (six new variables) predict prh_mean*, pr outcome(ohours) predict prh_fixedonly*, pr outcome(ohours) fixedonly

bwage is not a measure of L2 so the following predicted outcome
probabilities are the same.

Code:

*** Predictions of bwage (two new variables): predict prh_mean, pr outcome(bwage) predict prw_fixedonly, pr outcome(bwage) fixedonly

You then predict the EB means of L2

Code:

*** EB mean estimates (one new variable): predict EBw_mean, latent

and ask

Why is there no need to specify any variable name? Which mean is this?

The predicted values are for L2, they are the EB mean estimates for
this latent variable. The previous call to predict can be more
explicitly specified as

Code:

predict EBw_mean, latent(L2) mean

Now you

Attempt to obtain margins of each level of ohours, by analogy
Using expected (zero) L2 values

Code:

margins, expression(normal(predict(eta outcome(ohours) fixedonly))) margins, at(age=(34(2)46)) /// expression(normal(predict(eta outcome(ohours) fixedonly)))

but notice that

Seems to work, but ignores the route age -> L2 -> ohours?

It is true that the path age -> L2 -> ohours is ignored, but there are
a few problems with your expression:

You fit ohours using ologit which is a shortcut for
family(ordinal) and link(logit), but use the cdf for the normal
distribution in the expression. I take some responsibility for that, I
assumed you were using the probit link when I suggested those
expressions. I will change the expressions to use invlogit() which is
the correct function to use with the ordinal logistic model in our case.

The predicted probability for an ordinal outcome is computed by
differencing the cummulative probabilities between adjacent cutpoints,
recognizing that cut0 is minus infinity and cut3 is plus infinity in this
example.

Consider the marginal predicted probability for each level of ohours.

Code:

margins, predict(pr outcome(0.ohours) fixedonly) margins, predict(pr outcome(1.ohours) fixedonly) margins, predict(pr outcome(2.ohours) fixedonly)

Here are the corresponding margins commands to reproduce the above
using the expression() option.

Code:

* margins, predict(pr outcome(0.ohours) fixedonly) margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly))) * margins, predict(pr outcome(1.ohours) fixedonly) margins, expression( /// invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly)) - /// invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly))) * margins, predict(pr outcome(2.ohours) fixedonly) margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly)))

Now fixedonly really does treat the latent variables as zero, even if
they are predicted by other observed variables, but we can account for
ohours indirect dependence on age though L2 in our
expression.

Code:

margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age)) margins, expression( /// invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age) - /// invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age)) margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly) + _b[ohours:L2]*_b[L2:age]*age))

Now if you insist on using the EB mean predictions of L2 as fixed
predictors for computing the predictive margins of the outcome probabilities
of ohours, the expressions become

Code:

margins, expression(invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean)) margins, expression( /// invlogit(_b[ohours_cut2:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean) - /// invlogit(_b[ohours_cut1:_cons] - predict(eta outcome(ohours) fixedonly) - _b[ohours:L2]*_b[L2:age]*age - _b[ohours:L2]*EBw_mean)) margins, expression(invlogit(- _b[ohours_cut2:_cons] + predict(eta outcome(ohours) fixedonly) + _b[ohours:L2]*_b[L2:age]*age + _b[ohours:L2]*EBw_mean))

I hope this answers your following questions:

Is this any different from the expected values option?
Which level of ohours is estimated, only one probability is produced?
How write the command to get margins for each level of ohours?

You then try to obtain margins for the bwage outcome. There are two
things to note:

The expressions need to be updated to use invlogit() instead of
normal().

bwage is not predicted by L2, so I see no need to even bother
with the expressions, just use the fixedonly probability prediction
with margins.

bwage uses 2.ohours as a predictor, thus ohours is
effectively an endogenous predictor. margins doesn't like this and
will report the following error:

prediction is a function of possibly stochastic quantities other than e(b)

You can force margins to go forward with this calculation
by using the force option.

Code:

margins, predict(pr outcome(bwage) fixedonly) force
1 like
Comment

�
Jan Hultgren

Join Date: Jun 2014

Posts: 12
#5

27 Jun 2014, 14:54

Jeff, this is very helpful. I am extremely grateful and deeply impressed by your fast and complete reply. Thanks so much!

/Jan
Comment

�

Announcement

How estimate margins after gsem with a latent variable

Comment

Comment

Comment

Comment