How do you change the reference category in latent class regression?

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

How do you change the reference category in latent class regression?

20 May 2020, 13:32

As many of us know, Stata implemented latent classes starting in version 15. In latent class or latent profile models, you basically tell Stata there are k latent classes, now please calculate me the mean of a vector of variables Y. Now, we might have some different variables X that we think influence the probability of membership in the latent class, i.e. P(K = k | X). We use latent class regression for this. The latent class is a latent categorical variable anyway, so naturally you could fit a multinomial logistic regression with covariates to it.

By convention, unless specified, in multinomial logistic regression, Stata takes the lowest-valued category as the base. My question is, in latent class regression, how do we change the base or reference category?

For example, here's just the multinomial part of the model from a latent class regression using example data:

Code:

use http://www.stata-press.com/data/r15/gsem_lca2
gsem (glucose insulin sspg <- _cons) (C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
Generalized structural equation model           Number of obs     =        145
Log likelihood = -1519.7738

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.C          |  (base outcome)
-------------+----------------------------------------------------------------
2.C          |
      relwgt |   14.03413   2.819101     4.98   0.000     8.508794    19.55947
       _cons |  -14.50264   2.864154    -5.06   0.000    -20.11628   -8.889005
-------------+----------------------------------------------------------------
3.C          |
      relwgt |   5.186345   2.045551     2.54   0.011     1.177138    9.195552
       _cons |  -5.329615   1.930139    -2.76   0.006    -9.112617   -1.546613
------------------------------------------------------------------------------

The usual method with factor variables fails.

Code:

gsem (glucose insulin sspg <- _cons) (ib2.C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
Generalized structural equation model           Number of obs     =        145
Log likelihood = -1519.7738

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.C          |  (base outcome)
-------------+----------------------------------------------------------------
2.C          |
      relwgt |   14.03413   2.819101     4.98   0.000     8.508794    19.55947
       _cons |  -14.50264   2.864154    -5.06   0.000    -20.11628   -8.889005
-------------+----------------------------------------------------------------
3.C          |
      relwgt |   5.186345   2.045551     2.54   0.011     1.177138    9.195552
       _cons |  -5.329615   1.930139    -2.76   0.006    -9.112617   -1.546613
------------------------------------------------------------------------------

Specifying 1.C, 2.C, etc in the multinomial equation doesn't work either (omitting output)

Code:

gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog

Some software packages allow you to re-order the latent classes. You can kind of pull this off manually in Stata. You would fit your model, predict the probability of membership in each latent class, then rename the latent classes in the correct order. I'm going to make class 2 be the first class, and what was otherwise class 1 be the second class.

Code:

predict class*, classposteriorpr
rename class2 class0
rename class1 class2
rename class0 class1
gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3) startvalues(classpr class1 class2 class3) lcinvariant(none) covstructure(e._OEn, unstructured)

If you inspect the actual results, perhaps comparing the results of estat lcprob or estat lcmean, you'll see that this worked. So, there's that method (and by the way, this would be nice convenience functionality to add, particularly as we get to many latent classes). However, is there a simpler way to do this?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Tags: latent class analysis

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

21 May 2020, 16:30

It turns out that the suboption to do this was described right there in the help file.

lclass(lcname # [, base(#)]) specifies that the model be fit as described
above.

lcname specifies the name of a categorical latent variable, and #
specifies the number of latent classes. The latent classes are the
contiguous integers starting with 1 and ending with #.

Code:

gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3, base(2)) lcinvariant(none) covstructure(e._OEn, unstructured) nolog

Thanks to Bingsheng from StataCorp for pointing this out!

In the meantime, if you wish to re-order the latent classes for heuristic reasons, I present the solution above. To explain, you predict the class probabilities. Then, you rename the class probabilities in the order you want the classes to appear. Finally, you re-fit the model and you give those re-ordered probabilities as start values. Because you've supplied start values that you know lead to a good solution, your model should converge faster than it initially did.

In the past, I had used the user-written command coefplot to manually rename coefficients. This was a major pain in the rear end. You could probably do it using marginsplot as well.

Last edited by Weiwen Ng; 21 May 2020, 16:33.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement

How do you change the reference category in latent class regression?

Comment