Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do you change the reference category in latent class regression?

    As many of us know, Stata implemented latent classes starting in version 15. In latent class or latent profile models, you basically tell Stata there are k latent classes, now please calculate me the mean of a vector of variables Y. Now, we might have some different variables X that we think influence the probability of membership in the latent class, i.e. P(K = k | X). We use latent class regression for this. The latent class is a latent categorical variable anyway, so naturally you could fit a multinomial logistic regression with covariates to it.

    By convention, unless specified, in multinomial logistic regression, Stata takes the lowest-valued category as the base. My question is, in latent class regression, how do we change the base or reference category?

    For example, here's just the multinomial part of the model from a latent class regression using example data:

    Code:
    use http://www.stata-press.com/data/r15/gsem_lca2
    gsem (glucose insulin sspg <- _cons) (C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
    Generalized structural equation model           Number of obs     =        145
    Log likelihood = -1519.7738
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.C          |  (base outcome)
    -------------+----------------------------------------------------------------
    2.C          |
          relwgt |   14.03413   2.819101     4.98   0.000     8.508794    19.55947
           _cons |  -14.50264   2.864154    -5.06   0.000    -20.11628   -8.889005
    -------------+----------------------------------------------------------------
    3.C          |
          relwgt |   5.186345   2.045551     2.54   0.011     1.177138    9.195552
           _cons |  -5.329615   1.930139    -2.76   0.006    -9.112617   -1.546613
    ------------------------------------------------------------------------------
    The usual method with factor variables fails.

    Code:
    gsem (glucose insulin sspg <- _cons) (ib2.C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
    Generalized structural equation model           Number of obs     =        145
    Log likelihood = -1519.7738
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.C          |  (base outcome)
    -------------+----------------------------------------------------------------
    2.C          |
          relwgt |   14.03413   2.819101     4.98   0.000     8.508794    19.55947
           _cons |  -14.50264   2.864154    -5.06   0.000    -20.11628   -8.889005
    -------------+----------------------------------------------------------------
    3.C          |
          relwgt |   5.186345   2.045551     2.54   0.011     1.177138    9.195552
           _cons |  -5.329615   1.930139    -2.76   0.006    -9.112617   -1.546613
    ------------------------------------------------------------------------------
    Specifying 1.C, 2.C, etc in the multinomial equation doesn't work either (omitting output)

    Code:
    gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
    Some software packages allow you to re-order the latent classes. You can kind of pull this off manually in Stata. You would fit your model, predict the probability of membership in each latent class, then rename the latent classes in the correct order. I'm going to make class 2 be the first class, and what was otherwise class 1 be the second class.

    Code:
    predict class*, classposteriorpr
    rename class2 class0
    rename class1 class2
    rename class0 class1
    gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3) startvalues(classpr class1 class2 class3) lcinvariant(none) covstructure(e._OEn, unstructured)
    If you inspect the actual results, perhaps comparing the results of estat lcprob or estat lcmean, you'll see that this worked. So, there's that method (and by the way, this would be nice convenience functionality to add, particularly as we get to many latent classes). However, is there a simpler way to do this?
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

  • #2
    It turns out that the suboption to do this was described right there in the help file.

    lclass(lcname # [, base(#)]) specifies that the model be fit as described
    above.

    lcname specifies the name of a categorical latent variable, and #
    specifies the number of latent classes. The latent classes are the
    contiguous integers starting with 1 and ending with #.
    Code:
    gsem (glucose insulin sspg <- _cons) (1b.C 2.C 3.C <- relwgt), lclass(C 3, base(2)) lcinvariant(none) covstructure(e._OEn, unstructured) nolog
    Thanks to Bingsheng from StataCorp for pointing this out!

    In the meantime, if you wish to re-order the latent classes for heuristic reasons, I present the solution above. To explain, you predict the class probabilities. Then, you rename the class probabilities in the order you want the classes to appear. Finally, you re-fit the model and you give those re-ordered probabilities as start values. Because you've supplied start values that you know lead to a good solution, your model should converge faster than it initially did.

    In the past, I had used the user-written command coefplot to manually rename coefficients. This was a major pain in the rear end. You could probably do it using marginsplot as well.
    Last edited by Weiwen Ng; 21 May 2020, 17:33.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment

    Working...
    X