Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi all,

    This has been a very helpful thread for me but I have a quick question RE plausible entropy values. Clyde's code in post #7 works but generates a negative entropy value of -.93447178 after my LCA model. I thought entropy values had to be between 0 and 1! Anyone know what may be causing this, and whether I can just take the absolute value?


    Many thanks,

    Laura

    Comment


    • #17
      That code should not be generating negative values. Nor do I see anything in the code that is clearly the problem here. Please post an example data set that reproduces this result and I will try to troubleshoot it. Be sure to use the -dataex- command to post the example data.

      If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use -dataex-.

      Comment


      • #18
        Originally posted by Laura Brown View Post
        Hi all,

        This has been a very helpful thread for me but I have a quick question RE plausible entropy values. Clyde's code in post #7 works but generates a negative entropy value of -.93447178 after my LCA model. I thought entropy values had to be between 0 and 1! Anyone know what may be causing this, and whether I can just take the absolute value?


        Many thanks,

        Laura
        Laura, entropy definitely has to be between 0 and 1. I'd obviously check the code for typos. The first one that springs to mind is this: the denominator for the calculated entropy involves the natural log of k, the number of latent classes. The example code pertained to 2 latent classes, and you have to increase k every time you recalculate entropy. I think this wasn't explicitly stated in the thread. So, I'm stating it to remove ambiguity.

        Also, I think the code in post #7 might not have been robust to situations where some people had predicted class membership probabilities of 0 (or at least 0 within floating point precision), which I addressed in a later post in the thread. This should be a rare scenario, but it did happen to me with real data. In that situation, I considered a class membership probability of 0 to be plausible.

        If neither of these situations apply to you, can you post your code?
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #19
          Thanks Clyde and Weiwen for your speedy replies.

          This is the code I used:

          Code:
          *1) 2 class latent class model with all indicators included:
              set more off
              gsem (bfd menarche afb mwtkg <- ) ///
              (everbf    activities affection bwgst parcat rels ghq_75 regsmk alco <-, logit) ///
              (read vaxcat <-, ologit) ///
              /*if ethnic==1*/, ///
              lclass (C 2) startvalues(randompr, draws(5) seed(10))
              estat lcgof
              estimates store bib_c2_w
              
              *Entropy:
              quietly predict classpost*, classposteriorpr
              forvalues k = 1/2 {        
              gen sum_p_lnp_`k' = classpost`k'*ln(classpost`k')
              }
              egen sum_p_lnp = rowtotal(sum_p_lnp_*)
              summ sum_p_lnp, meanonly
              scalar E = 1+`r(sum)'/(e(N)*ln(2))
              drop classpost?    sum_p_lnp*
              di E
          
              *Sample size adjusted BIC:
              scalar SSBIC_bib_c2_w = -2 * e(ll) + e(rank) * ln((e(N)+2) / 24)
              di SSBIC_bib_c2_w
          It is a 2 class model so k is correct. I have already adjusted the code as per Weiwen's suggestion for the 0 probability scenario as previous preliminary analyses suggested that the sample splits with more than 95% in one class.

          I post a data example below but it only let me create it based on 100 observations. My actual dataset has 12,801 observations and the analysis above focuses on a subset of 3,938 White British mothers. I tried running the LCA and entropy code on several dataex datasets but a 100 cases is too small so either some categories of variables aren't present or the model just doesn't converge. Unfortunately I cannot share the full dataset due to access restrictions.


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(bfd menarche afb mwtkg everbf activities affection bwgst parcat rels ghq_75 regsmk alco read vaxcat)
            .4602735        15 30  58 1 . 1 0 1 0 0 0 1 0 .
            .9205469        13 36  77 1 . 1 1 1 0 0 0 1 0 .
                   .        15  .  65 0 . 1 1 1 0 0 0 1 0 .
            1.841095        15 24  65 1 . 1 1 1 0 0 0 0 0 .
           4.6027346        15  . 100 1 . 1 1 1 0 1 1 1 0 .
          .032876678 14.666667  .  54 1 0 1 1 1 1 1 1 1 0 1
                   .        15  .  80 0 . 1 1 1 1 1 1 1 0 .
                  12     15.25 28  62 1 . 1 1 1 0 0 0 1 0 .
                   .        12 21  52 0 . 1 1 1 0 0 0 1 0 .
           20.021896        12 36  58 1 . 1 1 1 0 0 1 1 0 .
                   .      12.5  .  65 0 . 1 0 1 1 0 1 0 0 .
                   .        13  . 110 0 . 1 1 1 0 0 1 1 . .
          .032876678        12 31  64 1 . 1 1 1 1 0 1 1 . .
          .032876678        16  .   . 1 . 1 1 1 0 0 0 0 0 .
           18.016418        10 39  95 1 . 1 1 1 0 0 0 1 0 .
                   .        13  .  75 0 . 1 1 1 0 0 1 1 1 .
            .1643834        13 25  52 1 . 1 1 1 0 0 0 0 0 .
                   .        11 28  98 0 . 1 1 1 0 0 1 1 . .
                   .        13  .  64 0 1 1 1 1 0 0 0 1 0 1
            .2301369        13  .  89 1 0 1 1 1 0 0 0 1 0 1
                  24         .  .  60 1 . 1 1 1 1 1 1 1 2 .
                  24      11.5  .  54 1 . 1 1 1 0 0 1 0 0 .
           .13150671 12.916667  .  55 1 1 0 1 1 0 1 0 1 0 1
           4.6027346        13 30  73 1 . 1 1 1 0 0 0 0 0 .
          .032876678        11 26  67 1 . 1 1 1 0 1 1 1 0 .
            7.002732        14 18  52 1 0 1 1 1 1 1 1 1 0 1
            .4602735     12.75 20   . 1 . 1 1 1 0 0 1 0 0 .
           11.013686        13  .  52 1 0 1 1 1 0 1 0 0 0 1
                   .        12 18  86 0 . 1 1 1 0 1 1 1 0 .
            .4602735        15 22  57 1 . . 1 1 0 0 1 1 0 .
                   .        13 19  50 0 . . 1 1 0 1 1 1 . .
                   .        14  .  69 0 . 1 1 1 0 1 1 1 2 .
                   .        14  .  60 0 . 0 1 1 1 0 0 1 0 .
                   . 12.583333  .  65 0 . 1 1 1 0 0 1 0 . .
                   .        13  .  82 0 . 0 1 1 0 0 1 0 . .
           1.1506836        13  .  60 1 . 1 0 1 0 0 0 0 1 .
                   6        11  .  75 1 . 1 1 1 0 0 1 1 . .
                   6        14 28  56 1 . . 1 1 0 0 0 1 . .
            .9205475        13 30  75 1 1 1 1 1 1 1 0 1 0 1
                   .        13  .  59 0 . 1 0 1 1 0 1 0 2 .
                   6        15  .  95 1 . 1 1 0 0 0 1 1 . .
            .6904102 13.083333 24  55 1 . 1 1 1 0 1 1 0 0 .
                   .        14 21  58 0 . 1 1 1 0 0 1 1 1 .
                   .        11  .  66 0 . . 1 1 0 0 0 1 . .
            6.016432        10 20 107 1 . 1 1 1 0 1 0 1 0 .
                   .        15  .  55 0 1 1 1 1 0 0 0 0 0 1
            15.02464        14  .  65 1 1 1 0 1 0 1 0 1 0 1
          .065753356        12  .  55 1 . 1 1 0 0 0 1 1 0 .
                   .        13 15  46 0 . . 1 1 1 1 0 0 0 .
                   .        12 15  69 0 . 1 1 1 1 0 1 1 . .
                   .        14  .  48 0 . 1 1 1 0 0 0 1 1 .
           .23013674        13  .  88 1 1 1 1 1 1 1 1 1 0 1
           1.3808213        17 25  82 1 . 1 1 1 0 1 0 1 . .
                   . 14.916667  .  53 0 . 1 1 1 1 0 1 1 . .
                  24        13 24  96 1 . 1 1 1 1 0 1 1 2 .
                   .        14 16  60 0 0 . 1 1 0 0 1 1 . 1
                   .        12  .  75 0 . 1 1 1 0 1 1 0 0 .
                   .         9 17  57 0 . . 1 1 0 0 1 1 0 .
          .032876678        13  .  55 1 1 1 1 1 1 0 1 0 0 1
            .9205469        13  .  63 1 . 1 1 0 0 1 1 1 0 .
            1.841095 15.416667 29  85 1 0 1 1 1 0 0 0 1 0 1
          .032876678        12  .  50 1 . 1 1 1 0 1 0 0 0 .
                   .        13 21  90 0 . 1 1 1 0 0 0 1 0 .
            .9205469        12 17  39 1 . 1 1 1 0 0 0 0 . .
            1.841094     14.25 23  99 1 . 1 1 1 0 1 1 1 0 .
            9.008209        12 18  56 1 1 1 1 1 1 1 1 1 0 2
                   .        13 22  69 0 . 1 1 1 0 0 1 1 . .
            9.008209        13 20 110 1 1 1 1 1 0 0 1 1 0 1
            8.021909        11  .  81 1 . 1 1 1 0 0 1 1 0 .
          .065753356 15.666667 28  97 1 . 1 1 1 0 0 1 1 0 .
            .4602735 13.833333 21  55 1 . 1 1 1 1 0 0 0 0 .
           1.3808204        12 23 126 1 . 1 1 1 0 0 1 1 0 .
                   .        14  . 107 0 . 1 1 1 0 1 1 1 0 .
          .065753356      10.5 23  95 1 0 1 1 1 0 1 0 0 0 1
                   .        11  .  75 0 . 1 1 1 1 1 1 1 0 .
                  24        14  .  65 1 . 0 1 1 1 0 0 0 2 .
                   .        14 20  46 0 . 1 1 1 1 1 1 0 2 .
          .032876678        11 19  47 1 . 1 1 1 1 0 1 1 0 .
                   .        12 18  67 0 1 0 1 1 0 0 0 0 0 2
                   .        12  . 106 0 . . 1 0 1 1 1 0 . .
                   .        16  .  78 0 . 1 1 1 0 1 1 0 0 .
           1.6109582        13 38 101 1 0 1 1 1 1 0 1 1 0 1
            .9205469        14  .   . 1 . 1 1 1 0 0 1 1 0 .
            15.02464        10  .  76 1 . 1 1 1 1 1 0 1 0 .
                   .      12.5  . 120 0 1 1 1 1 0 1 1 1 0 1
           4.0109544        10  .  78 1 . 1 1 0 0 0 0 0 0 .
           .23013674        13 17  63 1 . 1 1 1 0 0 1 1 1 .
           1.3808213        11  . 108 1 1 1 1 1 0 0 0 1 0 1
                   .         9  .  75 0 1 1 1 0 0 0 0 1 0 1
                   .        15  .  82 0 0 1 1 1 0 0 1 1 0 0
                   .     12.25 20  53 0 . 1 1 1 0 0 0 1 0 .
                   .        13  .  45 0 . 1 1 1 1 0 1 1 . .
          .032876678        12  .  55 1 1 1 0 1 1 0 1 0 . 1
            .3287668        13 22  62 1 0 1 1 1 1 1 0 1 0 1
            8.021909        13  .  68 1 . 1 1 1 0 1 1 0 0 .
                   .        13  .  57 0 . 1 1 1 0 0 0 0 0 .
           2.0054772        11  .  78 1 1 1 1 1 1 1 1 0 0 1
                   .        14  . 106 0 . 1 1 1 0 1 0 0 0 .
           1.1506836         9 20 104 1 . 1 1 1 1 0 1 0 0 .
                   . 11.583333 17  74 0 . 1 1 1 1 1 1 1 2 .
          end
          label values everbf everbf
          label def everbf 0 "No", modify
          label def everbf 1 "Yes", modify
          label values activities activities
          label def activities 0 "No", modify
          label def activities 1 "Yes", modify
          label values affection affection
          label def affection 0 "No", modify
          label def affection 1 "Yes", modify
          label values bwgst bwgst
          label def bwgst 0 "LBW and/or premature", modify
          label def bwgst 1 "Normal weight and term", modify
          label values parcat parcat
          label def parcat 0 "3+ other children", modify
          label def parcat 1 "1 or 2 other children", modify
          label values rels rels
          label def rels 0 "Living with baby's father", modify
          label def rels 1 "Not living with baby's father", modify
          label values ghq_75 ghq_75
          label def ghq_75 0 "<75th centile", modify
          label def ghq_75 1 ">=75th centile", modify
          label values regsmk regsmk
          label def regsmk 0 "No", modify
          label def regsmk 1 "Yes", modify
          label values alco alco
          label def alco 0 "No", modify
          label def alco 1 "Yes", modify
          label values read read
          label def read 0 "Once a week or less", modify
          label def read 1 "2-4 days per week", modify
          label def read 2 "5-7 days per week", modify
          label values vaxcat vaxcat
          label def vaxcat 0 "None", modify
          label def vaxcat 1 "1-9", modify
          label def vaxcat 2 "All 10", modify


          I'm baffled! Could it have anything to do with some variables having high levels of missingness? Several of my variables gave more than 50% missingness due to the questions only being asked of some women.I had understood that missingness on some variables wasn't a problem for LCA/LPA, but maybe it has an influence on the entropy calculation somehow?

          Thanks for your help,

          Laura

          Edit: So the entropy code seems to work fine when I run the same 2 class model with Pakistani origin mothers , giving me an entropy value of 0.5481156. However, a 3 class model with White British mothers gives another bizarre entropy value of -1.6307782.
          Last edited by Laura Brown; 02 Sep 2018, 14:56. Reason: Trying entropy code on different models

          Comment


          • #20
            Originally posted by Laura Brown View Post
            ...Could it have anything to do with some variables having high levels of missingness? Several of my variables gave more than 50% missingness due to the questions only being asked of some women.I had understood that missingness on some variables wasn't a problem for LCA/LPA, but maybe it has an influence on the entropy calculation somehow?

            Thanks for your help,

            Laura

            Edit: So the entropy code seems to work fine when I run the same 2 class model with Pakistani origin mothers , giving me an entropy value of 0.5481156. However, a 3 class model with White British mothers gives another bizarre entropy value of -1.6307782.
            When you estimate an LCA model, listwise missing (i.e. some people don't respond on some indicators; contrast vs casewise missing where some people are missing everything) is OK. Per my understanding, -gsem- will use all information present in each case.

            However, I'm not sure what happens when you use -predict- post-LCA estimation and one of the indicators is missing. If it produces missing predicted probabilities, then consistent with your intuition, I think this would explain your unusual results. You can try predicting class membership probabilities and checking if they're present or not in the cases with any missing indicator.

            If it's true that this is the problem, then the solution I'd suggest would be to simply modify the denominator to account for the number of cases with present predicted probabilities. You can use the command -count- to count the observations with present predicted probabilities. That would return a scalar r(N). Some suggested modified code:

            Code:
            *Entropy:
            quietly predict classpost*, classposteriorpr
            forvalues k = 1/2 {        
            gen sum_p_lnp_`k' = classpost`k'*ln(classpost`k')
            }
            egen sum_p_lnp = rowtotal(sum_p_lnp_*)
            summ sum_p_lnp, meanonly
            local sum = r(sum)
            quietly count sum_p_lnp
            scalar E = 1+`sum'/(r(N)*ln(2))
            drop classpost? sum_p_lnp*
            di E
            Does that resolve the issue?
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #21
              I think Weiwen has the key to the problem. It is probably the missing values causing the e(N) in the original code to be the wrong number.

              But there is an error in the code in #20. -quietly count sum_p_lnp- will produce a syntax error ("varlist not allowed"). I think he means
              Code:
              quietly count if !missing(sum_p_ln_p)

              Comment


              • #22
                Entropy is not necessarily bounded by 0 and 1. It's not a probability. See e.g. https://stats.stackexchange.com/ques.../207093#207093

                For k equally probable categories with equal probability (= 1 / k) the entropy is maximal at k (1/k) ln [1/(1/k)] = ln k. For k > 2 that is more than 1. Different details for other bases.

                FWIW, I find it most congenial to define entropy as the (weighted) average of ln (1/p) i.e. the sum of p ln(1/p). Most texts rewrite that even before you see it as first the sum of p (-ln p) and then second - the sum of p ln p. It's immediate from the definition of logarithms that the two are the same, but that doesn't make me like the usual expression. (And, occasionally, the minus sign gets lost, to some minor bewilderment.)

                Comment


                • #23
                  Originally posted by Weiwen Ng View Post

                  You can try predicting class membership probabilities and checking if they're present or not in the cases with any missing indicator.
                  I checked and the estimated probabilities are present for all cases, regardless of missingness, so it looks like missigness might not be the issue.

                  I tried Wiewen's and Clyde's combined code modification anyway as:

                  Code:
                  *Entropy:
                              drop classpost* sum_p*
                              quietly predict classpost*, classposteriorpr
                              forvalues k = 1/2 {        
                              gen sum_p_lnp_`k' = classpost`k'*ln(classpost`k')
                              }
                              egen sum_p_lnp = rowtotal(sum_p_lnp_*)
                              summ sum_p_lnp, meanonly
                              local sum = r(sum)
                              quietly count if !missing(sum_p_lnp)
                              scalar E = 1+`sum'/(r(N)*ln(2))
                              drop classpost? sum_p_lnp*
                              di E
                  This gave me an entropy value of 0.36942721 (in contrast to the previous estimate of -0.93447178).

                  Running the three class model with the modified code above (also changing k to 3) yields an entropy value of 0.14245471 (in contrast to the previous estimate of -1.6307782).

                  I am trying to wrap my head around Nick's post and what that means for my results. I am now wondering which of the entropy values is to be trusted. Or Nick, are you suggesting another edit to the code above is needed?

                  Even if the values of entropy don't have to range from 0 to 1, do scores closer to 1 still indicate clearer classifications (as suggested by Silverwood et al, 2011, p1409 for example)? That is, how does one interpret entropy values outside of the 0 to 1 bound? Is it just a case of taking absolute values and ignoring any negative sign and just assessing distance from 1? So in my case, as -0.93447178 is 1.93447178 away from 1 and -1.6307782 is 1.6307782 away from 1, is the three class model considered to indicate clearer classifications than the 2 class model? The missingness-adjusted entropy estimates would also suggest that the 3 class model is a better fit if we use this closeness to 1 interpretation. I apologise if I have completely missed the mark.

                  My brain hurts and it's 11pm here so I will check back tomorrow morning. Thanks for all of your input thus far, greatly appreciated.

                  Laura

                  Reference:
                  Silverwood, R. J., Nitsch, D., Pierce, M., Kuh, D., & Mishra, G. D. (2011). Characterizing longitudinal patterns of physical activity in mid-adulthood using latent class analysis: results from a prospective cohort study. American journal of epidemiology, 174(12), 1406-1415.

                  Comment


                  • #24
                    I think Nick's post refers to the entropy of a probability distribution, which is a different animal from the entropy of the classification system. The former is calculated as Nick indicates (and as I coded in #2 of this thread).

                    But that is not the statistic that is referred to as the entropy of the classification system (or model). The latter is, indeed, normalized to be between 0 and 1. It is calculated by the code Laura Brown gives in #23*, which incorporates my correction to Weiwen's correction of the code in #7. When the code in #7 was written, neither Weiwen nor I gave any thought to the possibility that there would be missing values, so we incorrectly based the normalization on e(N) instead of on r(N), leading to the problems Laura Brown pointed out in #16.

                    *One correction to the code in #23. As Weiwen pointed out earlier, for the general k-class model, the ln(2) factor needs to be replaced by ln(`k').

                    Comment


                    • #25
                      Originally posted by Clyde Schechter View Post
                      I think Nick's post refers to the entropy of a probability distribution, which is a different animal from the entropy of the classification system. The former is calculated as Nick indicates (and as I coded in #2 of this thread).

                      But that is not the statistic that is referred to as the entropy of the classification system (or model). The latter is, indeed, normalized to be between 0 and 1. It is calculated by the code Laura Brown gives in #23*, which incorporates my correction to Weiwen's correction of the code in #7. When the code in #7 was written, neither Weiwen nor I gave any thought to the possibility that there would be missing values, so we incorrectly based the normalization on e(N) instead of on r(N), leading to the problems Laura Brown pointed out in #16.

                      *One correction to the code in #23. As Weiwen pointed out earlier, for the general k-class model, the ln(2) factor needs to be replaced by ln(`k').
                      Folks, thanks for catching the issues. Nick is correct about entropy as defined in his linked post. As Clyde pointed out, we were discussing normalizedentropy above, which is bounded by 0 and 1 as we defined it.

                      Laura, if your entropy calculations are correct, then 0.37 and 0.14 are very low values of entropy, which means a very low degree of class separation. Imagine you had a perfect set of indicators, such that your LCA model was able to say that each person had a 0% probability of being in class 1 and a 100% probability of being in class 2 and vice versa. That's a (normalized) entropy of 1. How do you get there? Say you have 6 indicators. If class 1 was very high on indicators 1-3 and very low on indicators 4-6, and class 2 was exactly the reverse, then this would result in very high classification certainty and very high entropy. In real life, I don't think we have a lot of situations like that.

                      The code you typed appears correct (but I shall review when I can get to a computer with Stata 15). If you're right that missingness on an indicator doesn't preclude predicted class membership probabilities, then it could be that missingness still reduces classification certainty. If item missingness were really pervasive, then that could result in low entropy values. Can you give us a sense of what percent of each indicator is missing? You may already know this, but you can use

                      Code:
                      misstable summarize bfd menarche afb mwtkg everbf activities affection bwgst parcat rels ghq_75 regsmk alco read vaxcat
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #26
                        Thanks for clarifying Weiwen and Clyde, this is starting to make more sense.

                        Originally posted by Weiwen Ng View Post

                        Laura, if your entropy calculations are correct, then 0.37 and 0.14 are very low values of entropy, which means a very low degree of class separation
                        A low degree of class separation is not necessarily a bad thing in this instance as I am trying to show that people do not split neatly into two reproductive strategies (as is often assumed in some evolutionary psychology applications of life history theory).

                        Originally posted by Clyde Schechter View Post
                        *One correction to the code in #23. As Weiwen pointed out earlier, for the general k-class model, the ln(2) factor needs to be replaced by ln(`k').
                        I re-ran the 3 class model replacing ln(2) with ln(3) as per Clyde's suggestion above (NB: ln(`k') did not work, resulting in "invalid syntax"). This gave me 0.45894916 as my new entropy value.

                        Originally posted by Weiwen Ng View Post
                        If you're right that missingness on an indicator doesn't preclude predicted class membership probabilities, then it could be that missingness still reduces classification certainty. If item missingness were really pervasive, then that could result in low entropy values. Can you give us a sense of what percent of each indicator is missing?then it could be that missingness still reduces classification certainty. If item missingness were really pervasive, then that could result in low entropy values. Can you give us a sense of what percent of each indicator is missing?
                        Missingness doesn't preclude class membership probabilities. I ran the following code to check this and class membership is predicted for all 3,938 White British women in my sample:
                        Code:
                            
                                predict cpost* if ethnic==1, classposteriorpr
                                egen max = rowmax(cpost*) if ethnic==1
                                gen predclass_w=1 if cpost1==max & ethnic==1
                                replace predclass_w=2 if cpost2==max & ethnic==1
                                tab predclass_w
                                summ cpost1
                                summ cpost2
                        Item missingness is however very high as shown by the results of mdesc below:
                        Variable Missing Total Percent Missing
                        bfd 3,072 3,938 78.01
                        menarche 187 3,938 4.75
                        afb 1,945 3,938 49.39
                        mwtkg 146 3,938 3.71
                        everbf 80 3,938 2.03
                        activities 3,156 3,938 80.14
                        affection 3,438 3,938 87.30
                        bwgst 10 3,938 0.25
                        parcat 0 3,938 0.00
                        rels 5 3,938 0.13
                        ghq_75 590 3,938 14.98
                        regsmk 3 3,938 0.08
                        alco 7 3,938 0.18
                        read 3,498 3,938 88.83
                        vaxcat 3,167 3,938 80.42
                        With such high item missingness, does that then mean that the entropy values are less accurate, reflecting a missingness issue rather than a criticism of the tested classification?

                        Thanks,

                        Laura

                        Last edited by Laura Brown; 03 Sep 2018, 05:20. Reason: Reran 3 class model with Clyde's ln(`k') correction

                        Comment


                        • #27
                          Originally posted by Laura Brown View Post
                          Thanks for clarifying Weiwen and Clyde, this is starting to make more sense.


                          A low degree of class separation is not necessarily a bad thing in this instance as I am trying to show that people do not split neatly into two reproductive strategies (as is often assumed in some evolutionary psychology applications of life history theory).

                          ...

                          With such high item missingness, does that then mean that the entropy values are less accurate, reflecting a missingness issue rather than a criticism of the tested classification?

                          Thanks,

                          Laura
                          Kathryn Masyn's chapter in the Oxford Handbook of Quant Methods, which is quoted in Stata's latent class manual, does not recommend that entropy be used for model selection. It's a descriptive measure of how well-separated the classes are. So, you're correct, if that low entropy is correct, it's not a criticism of the model. You would still want to select the number of classes with the highest BIC. That said,

                          My suspicion is that with that high a rate of missingness, the entropy assigned to observations with missing values will be lower. It will be less accurate because it's derived from less information. I'll need to simulate some data to confirm this. I'm going to take Stata's stock dataset for LCA, go run a model and predict probabilities, then I'll randomly knock 20% of the responses on each indicator and predict probabilities again. I'll report back, but if anybody wants to beat me to the punch, please feel free to do so.

                          Side note: You said that some of the missingness may be by design rather than at random. I hope you've considered what implications that may have for your class enumeration. I'm not sure I am qualified to advise on that!
                          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                          Comment


                          • #28
                            Some code. I'm going to fit a 2-class LCA on the full data. Then, I'll create a dataset where the same indicators become missing completely at random (MCAR): each indicator has a 20% chance of being converted to missing.

                            Code:
                            use http://www.stata-press.com/data/r15/gsem_lca1
                            set seed 1000
                            gsem (accident play insurance stock <- ), logit lclass(C 2)
                            predict probfull*, classposteriorpr
                            
                            foreach v in accident play insurance stock {
                            gen `v'_miss = `v'
                            replace `v'_miss = . if runiform() > .8
                            }
                            gsem (*_miss <- ), logit lclass(C 2)
                            predict probmiss*, classposteriorpr
                            
                            sum probfull* probmiss*
                                Variable |        Obs        Mean    Std. Dev.       Min        Max
                            -------------+---------------------------------------------------------
                               probfull1 |        216    .7207539    .3778755   .0410183    .999975
                               probfull2 |        216    .2792461    .3778755    .000025   .9589816
                               probmiss1 |        216    .7503538    .3818407   .0350281          1
                               probmiss2 |        216    .2496462    .3818407   7.41e-14    .964972
                            Indeed, when we changed some indicator variables to missing at random, the standard deviation of the predicted probabilities increases. Now, let's check how entropy changes:

                            Code:
                            forvalues k = 1/2 {
                            gen sum_p_lnp_full_`k' = probfull`k'*ln(probfull`k')
                            gen sum_p_lnp_miss_`k' = probmiss`k'*ln(probmiss`k')
                            }
                            egen sum_p_lnp_full = rowtotal(sum_p_lnp_full_*)
                            egen sum_p_lnp_miss = rowtotal(sum_p_lnp_miss_*)
                            quietly summ sum_p_lnp_full, meanonly
                            local sum = r(sum)
                            quietly count if sum_p_lnp_full != .
                            scalar E_full = 1+`sum'/(r(N)*ln(2))
                            
                            quietly summ sum_p_lnp_miss, meanonly
                            local sum = r(sum)
                            quietly count if sum_p_lnp_miss != .
                            scalar E_miss = 1+`sum'/(r(N)*ln(2))
                            
                            . display E_full
                            .71929768
                            
                            . display E_miss
                            .79915984
                            Well, looks like my intuition was wrong for this example! The LCA model fit with some indicator variables MCAR actually has higher entropy. If you look at observations 2 and 5, this might give some insight into why. (Note, because you set the random number seed, you should get exactly the same results as I did despite using a random number generator to knock out some indicator variables in some observations, and your observations 2 and 5 will be the same as mine.)

                            Observation 5 had all 4 indicators changed to missing. That observation's predicted membership probabilities in complete data for class 1 and 2 were 0.999975 and 0.000025 respectively. In the LCA on the indicator variables that contain missing, that observation's predicted membership probabilities change to 0.9679506 and 0.0320494 respectively. That's more classification uncertainty. My intuition was correct for this observation.

                            Observation 2, however, had no indicators changed to missing. In the complete data, their predicted membership probabilities are the same as for observation 5 (because same response pattern). In the LCA on MCAR data, their predicted membership probabilities become more certain, not less. They change to 1 and 7.41 * 10^-14. There are far more cases like observation 2 than observation 5.

                            Where does that leave Laura? I'm not sure! Judging from entropy scores I had seen in real life (my own data plus papers I've read), my initial intuition was that entropy scores of 0.3 and below were very, very low. I've typically seen scores of 0.6 or higher in my work where most of the latent classes in the model were fairly close. I haven't seen papers reporting entropy scores of 0.3 or below. So, either there was an error stemming from some programming issue that we hadn't anticipated, or there's really that much classification uncertainty in the model! Laura, if you look at the -estat lcmeans- output from your models, I think you should see that the class-specific means for all your indicators are very close together. I would also fit a 1-class LCA model, and compare its BIC to the other models you fit.

                            Heuristically, in LCA, you propose that your data can best be explained by k classes that are homogeneous with respect to the indicators you specified. If the best fitting model has k = 1, then basically you have a homogeneous sample. If k is 2, then you've got a heterogeneous sample, and here are the characteristics of the two classes you think are present, and so on.
                            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                            Comment


                            • #29
                              Thanks for looking into this some more Weiwen.

                              Originally posted by Weiwen Ng View Post

                              Side note: You said that some of the missingness may be by design rather than at random. I hope you've considered what implications that may have for your class enumeration. I'm not sure I am qualified to advise on that!
                              My missingness comes from some questions only being asked in follow up sub-cohorts rather than in the main cohort. I will try running models restricted to women who are in these follow up sub-cohorts as a sensitivity analysis to see how that affects my results.

                              Originally posted by Weiwen Ng View Post
                              Laura, if you look at the -estat lcmeans- output from your models, I think you should see that the class-specific means for all your indicators are very close together. I would also fit a 1-class LCA model, and compare its BIC to the other models you fit.
                              Whilst the means for continuous vars (and probabilities of being in different categories for categorical vars) are relatively close together, the confidence intervals for the two classes do not overlap for the majority of indicators (everbf, activities, rels, ghq_75, regsml, alco, one category of read, and one category of vaxcat, bfd, afb and mwtkg) suggesting that there are clear differences in these traits between the two groups and that distinctive profiles are identifiable.

                              In terms of comparing BICs, I have been using the SS-BIC code you proposed in #43 in another thread:

                              Code:
                              scalar SSBIC_class_2 = -2 * e(ll) + e(rank) * ln((e(N)+2) / 24) di SSBIC_class_2
                              Perhaps now is a good time to check whether there are any adjustments that need to be made to the code for different numbers of classes? Or is it always the same? Where does the 24 come from?

                              Assuming the above SS-BIC code is correct, these are the values I get for the different models:

                              # classes AIC BIC SS-BIC Entropy
                              1 97738.21 97870.05 97803.33 .
                              2 96614.32 96859.18 96735.26 0.395577*
                              3 95999.71 96357.58 96176.46 0.563029*
                              4** 95847.83 96312.43 96077.3 0.548274
                              5** 95855.37 96432.98 96140.65 0.538238
                              *These entropy values are slightly different from what I reported in earlier posts as one of the indicator variables, afb, has been modified slightly to be more accurate.
                              **Would only converge using the nonrtolerance option



                              Based on the statistics above, and given the convergence issues with the 4 and 5 class models, it looks like the model with 3 classes fits the data best, having the lowest AIC and BIC values.

                              I am also going to see if I can calculate the Lo-Mendell-Rubin Likelihood Ratio Test (LMR-LRT) of goodness of fit (as per your posts here) to compare the models with different classes further…and then see if a similar story plays out for all fit statistics in the other ethnic group and in the sub-cohort restricted sensitivity analyses.


                              Thanks for all your input!

                              Laura





                              Comment


                              • #30
                                Originally posted by Laura Brown View Post
                                ...
                                My missingness comes from some questions only being asked in follow up sub-cohorts rather than in the main cohort. I will try running models restricted to women who are in these follow up sub-cohorts as a sensitivity analysis to see how that affects my results.



                                Whilst the means for continuous vars (and probabilities of being in different categories for categorical vars) are relatively close together, the confidence intervals for the two classes do not overlap for the majority of indicators (everbf, activities, rels, ghq_75, regsml, alco, one category of read, and one category of vaxcat, bfd, afb and mwtkg) suggesting that there are clear differences in these traits between the two groups and that distinctive profiles are identifiable.

                                In terms of comparing BICs, I have been using the SS-BIC code you proposed in #43 in another thread:

                                Code:
                                scalar SSBIC_class_2 = -2 * e(ll) + e(rank) * ln((e(N)+2) / 24) di SSBIC_class_2
                                Perhaps now is a good time to check whether there are any adjustments that need to be made to the code for different numbers of classes? Or is it always the same? Where does the 24 come from?

                                Assuming the above SS-BIC code is correct, these are the values I get for the different models:

                                # classes AIC BIC SS-BIC Entropy
                                1 97738.21 97870.05 97803.33 .
                                2 96614.32 96859.18 96735.26 0.395577*
                                3 95999.71 96357.58 96176.46 0.563029*
                                4** 95847.83 96312.43 96077.3 0.548274
                                5** 95855.37 96432.98 96140.65 0.538238
                                *These entropy values are slightly different from what I reported in earlier posts as one of the indicator variables, afb, has been modified slightly to be more accurate.
                                **Would only converge using the nonrtolerance option



                                ...
                                So, for sample size adjusted BIC, I think the adjustment may be more applicable in smaller samples. I think relying on BIC alone is fine. I've cited Nylund et al elsewhere, but their simulation study showed that the LMR LR test had a high false positive rate in simulated data with a complex structure (very unequal class sizes, some indicators don't distinguish classes well, some classes close together). I have a feeling that your proposed class structure is complex by at least some criteria. Hence, I'd recommend omitting the LMR test entirely.

                                I generally recommend that if you're unable to get a model to converge without -nonrtolerance-, treat it as not converged at all. Note that you should examine if any of the binary indicators have logit intercepts near +/- 15; if so, it's justifiable to constrain them at + or - 15 and then attempt to fit the model (this corresponds to a class probability of 0 or 1 for that indicator).

                                Entropy scores around 0.5 are more reasonable. I was worried that an entropy of 0.3 or lower was almost implausibly low.

                                Last, I hate to make your life more complicated, but most properly, you do need to explore different class structures for the continuous indicators, as Masyn indicated in her chapter quoted in our SEM example. (I think I've cited this chapter elsewhere in this thread.) Those involve the -lcinvariant(none)- and -covstructure(e.OE_n unstructured)- options. (Note, check for typos; I'm going off memory for the covariance structure option.) Specifically, the former allows the variance of the error term to vary (vs not vary) by latent class, and the latter allows the error terms of the continuous indicators to be correlated (vs default of uncorrelated). Heuristically, for the former option: the default is as if you're making k stamps with the same cookie cutter, but -lcinvariant(none)- basically allows you to make k stamps with cookie cutters of any size. The SEM example for LPA does allude to this, although perhaps it could be made more explicit that varying the class structures this way is a recommended practice.
                                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                                Comment

                                Working...
                                X