Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chi-square distribution and p-values in LCA

    Hello everyone,

    I am trying to compute latent class goodness of fit statistics using estat lcgof syntax in Stata version 15.1SE but the output seems not complete. That means, it gives me the AIC and BIC statistics but not G2 goodness fit (not showing me X2 distribution and p-value). The output is displaced below. The same syntax works fine when I use the STATA website dataset but not for mine. I used the same coding - Yes/No (1 or 0). Would you mind advising me why this happened? And any measure I should take.




    Many thanks,
    Liyu

  • #2
    Liyu,

    This was discussed this in an older post. If some observations have missing data on some of the indicators, that particular test statistic can't be calculated. Unlike most other estimation commands, LCA (actually, gsem in general) uses equation-wise deletion, rather than case-wise. An observation with any missing data is usually thrown out entirely. In LCA (and gsem), that happens as well, but you are effectively fitting multiple equations.

    In all the papers in health services research that use LCA, I have not seen anyone show that test statistic. I would argue you can do without it. I think it is something like a test of exact fit, and because it's based on a chi-square statistic, it has the usual weaknesses of chi-square tests (too likely to reject in large samples). BIC alone should suffice.

    I remember some time ago we discussed the LMR likelihood ratio test. I would reiterate that with a complex class structure (e.g. classes differ widely in size, class separation not large, some indicators are high in multiple classes), simulation studies found that the LMR test had high false positive rates. Again, BIC alone is OK.

    I am preparing to submit one paper that relies on BIC alone, and if I end up having to eat my words, I will let everyone know.
    Last edited by Weiwen Ng; 30 Aug 2018, 11:27.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you Weiwen, that was helpful.

      Yeah, I have few missing values for two of the nine indicator variables but not manly , less than 4%. But I am not too sure where can I specify the listwise option in LC. Would you mind advising me on this? The other point I would like to ask is about very large SE. I have found very large SE in some of the classes and one of the indicator variable resulted in a missing values (.) for SE, P-value and CI. Can anyone tell me why this happened? I appreciate your help.

      Thanks in advance.
      Last edited by Liyuwork Dana; 31 Aug 2018, 01:47.

      Comment


      • #4
        Originally posted by Liyuwork Dana View Post
        Thank you Weiwen, that was helpful.

        Yeah, I have few missing values for two of the nine indicator variables but not manly , less than 4%. But I am not too sure where can I specify the listwise option in LC. Would you mind advising me on this? The other point I would like to ask is about very large SE. I have found very large SE in some of the classes and one of the indicator variable resulted in a missing values (.) for SE, P-value and CI. Can anyone tell me why this happened? I appreciate your help.

        Thanks in advance.
        The syntax should look something like this:

        Code:
        use http://www.stata-press.com/data/r15/gsem_sysdsn1
        summarize
        gsem (site insure <-, mlogit) (male nonwhite <-, logit), lclass(C 2) startvalues(randompr, draws(20)) listwise
        As to the missing SE, in my experience, this has usually occurred to me when I've used the -nonrtolerance- option and one of the logit intercepts is greater than 15 or less than -15. This means that the class-specific probability of response is nearly 1 or nearly 0. When that happens, the model won't converge without -nonrtolerance-, because the likelihood function is not concave. (If you want an explanation as to why, you should probably consult a real statistician.)

        In this case, it's usually justifiable to constrain the intercept at + or -15 if you believe that near 0 or near certain probability of endorsement is substantively justifiable. See this thread. If the class involved was small, I would be sure to highlight this to readers, because the estimated probability could change considerably in a different sample.

        I would strongly recommend that if you fit a model with the -nonrtolerance- option enabled, you save the parameter estimates, issue whatever constraints are justifiable, then re-fit the model without that option (i.e. let Stata use its default tolerances for the likelihood function).
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thanks Weiwen, that was really helpful and the syntax did work. Now I can be able to calculate chi-square and corresponding p-values. I appreciate your help

          Comment


          • #6
            Hi Weiwin,

            I tried the listwise option, but was still unable to generate the p-value and log likelihood ratio. The code I used was :

            gsem (wsctever wsecigever alcolife pdlife vmarlife wscmever <- _cons) (A<- gen ag ret), family(bernoulli) link(logit) lclass(A 2) lcinvariant(none) level(95) listwise Please advise. Shivani

            Comment

            Working...
            X