Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to calculate entropy for LCA with Stata

    Hi. I'm dealing with latent class analysis right now.
    I wonder how to calculate entropy for LCA with Stata, also other goodness of fit indices like LMR-LRT or BLT(I guess those are shown with M-plus, not with STATA, right?)
    I know it had been discussed in forum few years ago.
    But It's a bit confusing to me, because I don't know the syntax for it.
    Anyone who knows, please help me out.

    Also,
    It's a bit different topic, but according to prior researches using latent class analysis,
    results are bit different by statistic programs(M-plus vs Stata) especially in the part goodness of fit indices.
    I'm just wondering what kind of indices I should report in the paper when using STATA(I can see AIC/BIC but there are other indices, so are there any rules for it?)

    Thank you in advance!

  • #2
    Might recommend taking a look at the fmmlc module on SSC which was developed for a factor mixture modeling software (i.e., fmm on SSC) prior to the introduction of gsem but does include an entropy metric.

    The code to reproduce the entropy component is available in that module though and seems fairly easy to generalize to gsem. For example, for the following LCA:

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . gsem ( price headroom <- ), reg lclass(C 2)
    
    Fitting class model:
    
    Iteration 0:   (class) log likelihood = -51.292891  
    Iteration 1:   (class) log likelihood = -51.292891  
    
    Fitting outcome model:
    
    Iteration 0:   (outcome) log likelihood = -747.76751  
    Iteration 1:   (outcome) log likelihood = -747.76751  
    
    Refining starting values:
    
    Iteration 0:   (EM) log likelihood = -804.88665
    Iteration 1:   (EM) log likelihood = -805.70603
    Iteration 2:   (EM) log likelihood = -805.28774
    Iteration 3:   (EM) log likelihood = -804.59382
    Iteration 4:   (EM) log likelihood =  -803.9413
    Iteration 5:   (EM) log likelihood = -803.42255
    Iteration 6:   (EM) log likelihood = -803.04284
    Iteration 7:   (EM) log likelihood = -802.77782
    Iteration 8:   (EM) log likelihood = -802.59808
    Iteration 9:   (EM) log likelihood =  -802.4783
    Iteration 10:  (EM) log likelihood = -802.39874
    Iteration 11:  (EM) log likelihood = -802.34692
    Iteration 12:  (EM) log likelihood = -802.31317
    Iteration 13:  (EM) log likelihood = -802.29117
    Iteration 14:  (EM) log likelihood =  -802.2768
    Iteration 15:  (EM) log likelihood = -802.26739
    Iteration 16:  (EM) log likelihood =  -802.2612
    Iteration 17:  (EM) log likelihood = -802.25712
    Iteration 18:  (EM) log likelihood = -802.25442
    Iteration 19:  (EM) log likelihood = -802.25261
    Iteration 20:  (EM) log likelihood =  -802.2514
    Note: EM algorithm reached maximum iterations.
    
    Fitting full model:
    
    Iteration 0:   log likelihood = -784.38762  
    Iteration 1:   log likelihood = -784.38762  
    
    Generalized structural equation model           Number of obs     =         74
    Log likelihood = -784.38762
    
     ( 1)  [/]var(e.price)#1bn.C - [/]var(e.price)#2.C = 0
     ( 2)  [/]var(e.headroom)#1bn.C - [/]var(e.headroom)#2.C = 0
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.C          |  (base outcome)
    -------------+----------------------------------------------------------------
    2.C          |
           _cons |  -.0769744   .3549198    -0.22   0.828    -.7726044    .6186555
    ------------------------------------------------------------------------------
    
    Class          : 1
    
    Response       : price
    Family         : Gaussian
    Link           : identity
    
    Response       : headroom
    Family         : Gaussian
    Link           : identity
    
    ---------------------------------------------------------------------------------
                    |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    price           |
              _cons |   5544.003   533.3587    10.39   0.000     4498.639    6589.367
    ----------------+----------------------------------------------------------------
    headroom        |
              _cons |   2.360981   .1247984    18.92   0.000     2.116381    2.605582
    ----------------+----------------------------------------------------------------
        var(e.price)|    8165126    1386167                       5854060    1.14e+07
     var(e.headroom)|   .2742943   .0676906                      .1691051    .4449149
    ---------------------------------------------------------------------------------
    
    Class          : 2
    
    Response       : price
    Family         : Gaussian
    Link           : identity
    
    Response       : headroom
    Family         : Gaussian
    Link           : identity
    
    ---------------------------------------------------------------------------------
                    |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    price           |
              _cons |    6836.22   562.7211    12.15   0.000     5733.307    7939.133
    ----------------+----------------------------------------------------------------
    headroom        |
              _cons |   3.676095   .1334365    27.55   0.000     3.414564    3.937626
    ----------------+----------------------------------------------------------------
        var(e.price)|    8165126    1386167                       5854060    1.14e+07
     var(e.headroom)|   .2742943   .0676906                      .1691051    .4449149
    ---------------------------------------------------------------------------------
    Entropy is computable using:

    Code:
    . predict pr*, classposteriorpr
    
    . local ent = 0
    
    . forvalues i = 1/2 {
      2. gen temp`i'=(log(pr`i')*(pr`i'*-1))
      3. sum temp`i', meanonly
      4. local ent =`ent' + r(sum)
      5. }
    
    . scalar ent=1-(`ent'/(e(N)*ln(e(k))))
    
    . scalar list ent
           ent =  .89517604
    Again, note that this code is an adaptation of that provided by Joerg Luedicke (to whom thanks for this contribution is in order).

    - joe
    Last edited by Joseph Luchman; 19 Jan 2021, 15:41. Reason: omitted -predict- postestimation command
    Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
    ----
    Research Fellow
    Fors Marsh

    ----
    Version 18.0 MP

    Comment


    • #3
      wow Thank you so much!

      Comment


      • #4
        Hi all,

        I tried using the code to compute entropy, but am getting the error: unknown function /() after the first scalar line.

        I've pasted my code here: I am predicting 3 classes.
        __________________________________________________ _________________________________

        gsem(device_tot onlineactive_tot appusage_tot tech_4 <-, regress), lclass(C 3)

        estat lcprob
        estat lcgof
        predict classpost*, classposteriorpr

        local ent = 0
        forvalues i = 1/3 {
        gen temp`i'=(log(classpost`i')*(classpost`i'*-1))
        sum temp`i', meanonly
        local ent = `ent' + r(sum)
        }

        scalar ent=1-(`ent'/(e(N)*ln(e(k))))
        scalar list ent



        __________________________________________________ _____________

        Can you please help me identify what is going wrong?

        Thank you,
        Deirdre

        Comment


        • #5
          And this is the post I was referencing when using the code:


          Originally posted by Joseph Luchman View Post
          Might recommend taking a look at the fmmlc module on SSC which was developed for a factor mixture modeling software (i.e., fmm on SSC) prior to the introduction of gsem but does include an entropy metric.

          The code to reproduce the entropy component is available in that module though and seems fairly easy to generalize to gsem. For example, for the following LCA:

          Code:
          . sysuse auto
          (1978 Automobile Data)
          
          . gsem ( price headroom <- ), reg lclass(C 2)
          
          Fitting class model:
          
          Iteration 0: (class) log likelihood = -51.292891
          Iteration 1: (class) log likelihood = -51.292891
          
          Fitting outcome model:
          
          Iteration 0: (outcome) log likelihood = -747.76751
          Iteration 1: (outcome) log likelihood = -747.76751
          
          Refining starting values:
          
          Iteration 0: (EM) log likelihood = -804.88665
          Iteration 1: (EM) log likelihood = -805.70603
          Iteration 2: (EM) log likelihood = -805.28774
          Iteration 3: (EM) log likelihood = -804.59382
          Iteration 4: (EM) log likelihood = -803.9413
          Iteration 5: (EM) log likelihood = -803.42255
          Iteration 6: (EM) log likelihood = -803.04284
          Iteration 7: (EM) log likelihood = -802.77782
          Iteration 8: (EM) log likelihood = -802.59808
          Iteration 9: (EM) log likelihood = -802.4783
          Iteration 10: (EM) log likelihood = -802.39874
          Iteration 11: (EM) log likelihood = -802.34692
          Iteration 12: (EM) log likelihood = -802.31317
          Iteration 13: (EM) log likelihood = -802.29117
          Iteration 14: (EM) log likelihood = -802.2768
          Iteration 15: (EM) log likelihood = -802.26739
          Iteration 16: (EM) log likelihood = -802.2612
          Iteration 17: (EM) log likelihood = -802.25712
          Iteration 18: (EM) log likelihood = -802.25442
          Iteration 19: (EM) log likelihood = -802.25261
          Iteration 20: (EM) log likelihood = -802.2514
          Note: EM algorithm reached maximum iterations.
          
          Fitting full model:
          
          Iteration 0: log likelihood = -784.38762
          Iteration 1: log likelihood = -784.38762
          
          Generalized structural equation model Number of obs = 74
          Log likelihood = -784.38762
          
          ( 1) [/]var(e.price)#1bn.C - [/]var(e.price)#2.C = 0
          ( 2) [/]var(e.headroom)#1bn.C - [/]var(e.headroom)#2.C = 0
          
          ------------------------------------------------------------------------------
          | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          1.C | (base outcome)
          -------------+----------------------------------------------------------------
          2.C |
          _cons | -.0769744 .3549198 -0.22 0.828 -.7726044 .6186555
          ------------------------------------------------------------------------------
          
          Class : 1
          
          Response : price
          Family : Gaussian
          Link : identity
          
          Response : headroom
          Family : Gaussian
          Link : identity
          
          ---------------------------------------------------------------------------------
          | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
          price |
          _cons | 5544.003 533.3587 10.39 0.000 4498.639 6589.367
          ----------------+----------------------------------------------------------------
          headroom |
          _cons | 2.360981 .1247984 18.92 0.000 2.116381 2.605582
          ----------------+----------------------------------------------------------------
          var(e.price)| 8165126 1386167 5854060 1.14e+07
          var(e.headroom)| .2742943 .0676906 .1691051 .4449149
          ---------------------------------------------------------------------------------
          
          Class : 2
          
          Response : price
          Family : Gaussian
          Link : identity
          
          Response : headroom
          Family : Gaussian
          Link : identity
          
          ---------------------------------------------------------------------------------
          | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ----------------+----------------------------------------------------------------
          price |
          _cons | 6836.22 562.7211 12.15 0.000 5733.307 7939.133
          ----------------+----------------------------------------------------------------
          headroom |
          _cons | 3.676095 .1334365 27.55 0.000 3.414564 3.937626
          ----------------+----------------------------------------------------------------
          var(e.price)| 8165126 1386167 5854060 1.14e+07
          var(e.headroom)| .2742943 .0676906 .1691051 .4449149
          ---------------------------------------------------------------------------------
          Entropy is computable using:

          Code:
          . predict pr*, classposteriorpr
          
          . local ent = 0
          
          . forvalues i = 1/2 {
          2. gen temp`i'=(log(pr`i')*(pr`i'*-1))
          3. sum temp`i', meanonly
          4. local ent =`ent' + r(sum)
          5. }
          
          . scalar ent=1-(`ent'/(e(N)*ln(e(k))))
          
          . scalar list ent
          ent = .89517604
          Again, note that this code is an adaptation of that provided by Joerg Luedicke (to whom thanks for this contribution is in order).

          - joe

          Comment


          • #6
            I tried using the code to compute entropy, but am getting the error: unknown function /() after the first scalar line.
            Looks like Stata isn't reading something right as it is trying to read '/()' as a function as opposed to '/'.

            My only suggestion is that, if you copied and pasted the functions in the original post, check to be sure it didn't copy over as some unexpected unicode character that Stata is using differently than expected.
            Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
            ----
            Research Fellow
            Fors Marsh

            ----
            Version 18.0 MP

            Comment


            • #7
              Hi all,

              I am trying to use this code exactly to calculate entropy. See my code below for an error message. Any suggestions are welcome to address this error message.

              Code:
               forvalues i = 1/4 {
              gen temp`i'=(log(cpost`i')*(cpost`i'*-1))
              sum temp`i', meanonly
              local ent =`ent' + r(sum)
              }
              unknown function +r()

              Comment


              • #8
                There has to be an initialisation before the loop.

                Code:
                local ent = 0
                Otherwise first time around the loop, Stata sees

                Code:
                local ent = + r(sum)
                with the resulting error message that you report.

                Comment


                • #9
                  Thank you for a quick reply, Nick!

                  Comment


                  • #10
                    Hi everyone,
                    I'm working on a LCA with 3 classes. The code example I saw earlier seemed to be for 2 classes. Could anyone share how to calculate entropy specifically for a 3-class LCA?

                    Comment

                    Working...
                    X