Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Latent class analysis-- estat mean CIs

    I have been using
    Code:
    gsem
    to conduct a multi-indicator latent class analysis. The best fitting model has two classes. My code for it is basically:

    Code:
    gsem (x1 x2 x3 x4 x5 <-, regress), (x6 <-, ologit) (x7 x8 x9 x10 x11 x12 x13 <-, logit), lclass(c 2)
    estat lcprob
    estat lcmean
    My question is: After running the last line above, what do the 95% CIs in the ouput indicate? For 7 of the 13 variables, the CIs overlap. Does this mean those indicators are not useful in distinguishing the classes? If so, what test(s) is STATA doing to determine this? Is there another command I could run to produce a p-value that would show up in this table?

  • #2
    This is late, but maybe this will help someone.

    When you fit a latent class analysis, you tell the program that there are k groups which have different mean levels of x1, x2, etc. When you have an ordered logistic indicator, "mean level" means that each class has different proportions of each level of the ordered variable (e.g. different proportions of high, medium, and low). estat lcmean gives you the mean level of each indicator by latent class. Actually, if you had only Gaussian or binary indicators, you can simply derive the class-specific means from the output table. You inverse logit any logit intercepts to get the class-specific probability. For a Gaussian indicator, the intercept is the mean. It's as simple as that. Verify this using sem examples 50 to 52 if you like. With ordered logit, you could also do this, but the math will be clunkier.

    NB: in the code below:

    Code:
    gsem (x1 x2 x3 x4 x5 <-, regress)
    I'm pretty sure Stata interprets this as treat the indicator as Gaussian (i.e. estimate a mean and an error variance for it). You could have left the options blank (since Gaussian is default). the regress option may just be ignored, and it would make more sense if you were doing a finite mixture model, i.e. fit a regression model but assume that there are heterogeneous groups in the sample.

    As I said, your output table contained the mean level of each indicator for each class. The 95% CIs in the output table from estat lcmean represent our uncertainty in the means. It's just like a regression model. We are 95% confident, whatever that means, that the mean lies in this interval. That's all it means. Stata does not conduct a formal test to determine which indicators separate the classes well. There is one descriptive statistic, variable-specific entropy, that MPlus will calculate for the model. It does give you a global report of how well each variable separates the latent classes. Values near 1 mean high separation. Values under about 0.7 mean not so good separation. This is different from the overall model entropy, which is a global report of how certain you are about each observation's latent class assignment - and high certainty usually means high separation, so you probably had multiple indicators that distinguished the latent classes from each other. Anyway, back to variable-specific entropy. Can anybody decode Asparouhov and Muthén's algebra? Because this seems like a potentially useful thing, and I can't figure out what is being summed over what. Seriously, the formula looks simple, but I can't figure it out.

    You don't necessarily need any fancy math to determine which variables are good at identifying which classes. You can just plot your variables, in Excel or using marginsplot after estat lcmean. This question has more discussion.

    And now, I don't mean to nitpick. However, you treated some indicators as Gaussian. You probably weren't aware of this, but you basically told the program to assume that across all latent classes, the variances of each Gaussian indicator are identical. You also told Stata to assume that, within each latent class, all 6 indicators have zero correlation. These are assumptions that I think are unrealistically restrictive. I discussed this issue in this post, which has graphics and a worked example. I am not saying this to pick on you. You probably did not realize you did this, because the manual isn't as clear as it could be, and because the default set of options will essentially nudge less experienced users into doing this. And this is a complex technique, and it's hard to visualize what you're doing as a newer user.

    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment

    Working...
    X