Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Weiwen Ng View Post

    Harshit, params_null is the number of parameters in the null model, and params_alt is the number of parameters in the alternative (i.e. less parsimonious) model. In a regular LR test, these would be the difference in degrees of freedom in each model.

    Now, you do raise an interesting point. I assigned the number of parameters as e(k), which is returned by gsem. That's the number of parameters in the model.

    However, in Stata's gsem example data from example 50, which I used in my sample code above, one parameter is always zero - that's the parameter concerning the base class in the multinomial logit model for the classes. It is zero because it has to be.

    Maybe I misinterpreted what the number of parameters means! I do apologize if this is my misunderstanding. I believe the degrees of freedom should correspond to the scalar e(rank). That's the rank of the variance-covariance matrix, which is returned as e(V). Can you check if e(rank) corresponds to your reported degrees of freedom? After running a -gsem- command, you can do that by typing:

    Code:
    display e(rank)
    Yes. Thank you. e(rank) is exactly what I had in mind. Thanks a lot.

    Comment


    • #17
      Hi Wiewen,
      I had used your code on post #2 as my personal ado file. Afterwards, I performed a two class and threeclass analysis, storing the estimates after each -gsem- model specification. However the code:
      Code:
      lmrtest class2 class3
      did not give any result/error. Also I typed
      Code:
      scalar list lmrt_p
      afterwards. However Stata gave me the error message:
      Code:
      scalar lmrt_p not found
      r(111);
      I used the stata file from https://stats.idre.ucla.edu/wp-conte...16/02/lca1.dat. the sample code is as follows.
      Code:
      import delimited using lca-mplus.txt   //saved text file from the website
      gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 2)
      estimates store class2
      gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 3)
      estimates store class3
      lmrtest class2 class3
      scalar list lmrt_p
      Can you point me out where it went wrong?
      LMRT test with Mplus gives a value of 38.468 and p-value of 0.1500.
      thank you very much

      Comment


      • #18
        Originally posted by Wossenseged Jemberie View Post
        Hi Wiewen,
        I had used your code on post #2 as my personal ado file. Afterwards, I performed a two class and threeclass analysis, storing the estimates after each -gsem- model specification. However the code:
        Code:
        lmrtest class2 class3
        did not give any result/error. Also I typed
        Code:
        scalar list lmrt_p
        afterwards. However Stata gave me the error message:
        Code:
        scalar lmrt_p not found
        r(111);
        I used the stata file from https://stats.idre.ucla.edu/wp-conte...16/02/lca1.dat. the sample code is as follows.
        Code:
        import delimited using lca-mplus.txt //saved text file from the website
        gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 2)
        estimates store class2
        gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 3)
        estimates store class3
        lmrtest class2 class3
        scalar list lmrt_p
        Can you point me out where it went wrong?
        LMRT test with Mplus gives a value of 38.468 and p-value of 0.1500.
        thank you very much
        Wossenseged,

        After you type the lmrtest command, can you type

        Code:
        scalar dir
        That will show all scalars in memory. If you had copied the code but renamed the scalar, that would catch it (or if I, in my posting to the forum, named it something else).

        You should note what I said to Liyuwork (the original poster) earlier: the LMR test looks like it has a high false positive rate compared to other tests when the latent class structure is complex. I fear that "complex" actually describes a lot of real-world cases - it can include cases when one class is small, and/or when the indicators cross-load or don't separate the classes strongly. You should be entirely justified in using just the BIC.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #19
          Originally posted by Weiwen Ng View Post

          Wossenseged,

          After you type the lmrtest command, can you type

          Code:
          scalar dir
          That will show all scalars in memory. If you had copied the code but renamed the scalar, that would catch it (or if I, in my posting to the forum, named it something else).

          You should note what I said to Liyuwork (the original poster) earlier: the LMR test looks like it has a high false positive rate compared to other tests when the latent class structure is complex. I fear that "complex" actually describes a lot of real-world cases - it can include cases when one class is small, and/or when the indicators cross-load or don't separate the classes strongly. You should be entirely justified in using just the BIC.
          thanks, Weiwen for your reply. I typed
          Code:
          scalar dir
          , however this did not display any number. I have not tweaked the code from your post on #2. I am actually comparing the results from Stata and Mplus for this simple example.

          Comment


          • #20
            Originally posted by Wossenseged Jemberie View Post

            thanks, Weiwen for your reply. I typed
            Code:
            scalar dir
            , however this did not display any number. I have not tweaked the code from your post on #2. I am actually comparing the results from Stata and Mplus for this simple example.
            This could be an issue with how/where you stored the ado file. Had you tried manually calculating the LR test statistic using post 6, with the correction in post 15?
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #21
              Originally posted by Weiwen Ng View Post

              This could be an issue with how/where you stored the ado file. Had you tried manually calculating the LR test statistic using post 6, with the correction in post 15?
              I think I have stored the ado file in my personal folder.
              Code:
              personal dir
              shows me that I have the file there.
              Code:
              your personal ado-directory is c:\ado\personal\
                <dir> .                   <dir> ..                   0.8k lmrtest.ado
              I dont know why I cannot see the scalars with the lmrtest ado in #2. However,

              I had also done the calculation manually according to the code in #6 and again by changing the e(k) to e(rank) for the parameters. But I think the problem is with the results returned for p and q.
              Code:
              . scalar dir
              modlr_test_stat =          .
              lr_test_stat =  39.024763
                       q =         -3
                       p =         -3
               classes_3 =          3
                 parms_3 =         29       //e(rank), e(k) returns 30
                    ll_3 = -4231.6958
               classes_2 =          2
                 parms_2 =         19    //e(rank), e(k) returns 20
                    ll_2 = -4251.2082
                       n =       1000
              As p and q are both -3, modlr_test_stat will return infinity (missing). i.e 0 ^-1 for the denominator. the codes from #6 are
              Code:
              scalar p = (3 * `classes_3' - 1)       
               scalar q = (3 * `classes_2' - 1)
              scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)
              So if I do the automatic lmrtest as posted in #2
              the result from
              Code:
               
               return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
              does not show me anything. Do you think that might be the problem?

              Comment


              • #22
                Hi everyone,

                Can you please help me what is wrong with the syntax I am using (below) to calculate chi-squares and p-values comparing classes in LPA.
                I tried to follow the syntax given by #2 in this post to calculate "Lo–Mendell–Rubin Adjusted Likelihood Ratio Test" .

                I have got the chi-square distribution but couldn't find the p-value




                quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 4)
                estimates store class4
                scalar n = e(N)
                scalar ll_4 = e(ll)
                scalar parms_4 = e(k)
                matrix a = e(lclass_k_levels)
                scalar clasess_4quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 4)
                estimates store class4
                scalar n = e(N)
                scalar ll_4 = e(ll)
                scalar parms_4 = e(k)
                matrix a = e(lclass_k_levels)
                scalar clasess_4 = 4

                quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 5)
                estimates store class5
                scalar n = e(N)
                scalar ll_5 = e(ll)
                scalar parms_5 = e(k)
                matrix a = e(lclass_k_levels)
                scalar clasess_5 = 5
                scalar p = (5 * clasess_5 - 1)
                scalar q = (5 * clasess_4 - 1)
                scalar lr_test_stat = -2 * (ll_4 - ll_5)
                scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n))^ -1)
                scalar list modlr_test_stat
                return scalar lmrt_p = chi2tail(parms_5 - parms_4,modlr_test_stat) = 4


                the other point is that I have found negative chi-square test distribution which is not correct comparing class 5 and class6, can you give me some feedback why X2 is coming -ve and how can I correct it?


                Thank you for you help in advance,
                Liyu

                Comment


                • #23
                  Hello! I am conducting an LCA and I'm seeking to run the lmrtest afterwards.

                  First, I'm following the code from #2 to program the lmr test.
                  Code:
                  program lmrtest, rclass
                  version 15.0
                  args estimates_null estimates_alternative
                  tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
                  quietly {
                  estimates restore `1'
                  scalar `n' = e(N)
                  scalar `ll_null' = e(ll)
                  scalar `parms_null' = e(k)
                  matrix `a' = e(lclass_k_levels)
                  scalar `classes_null' = `a'[1,1]
                  estimates restore `2'
                  scalar `ll_alt' = e(ll)
                  scalar `parms_alt' = e(k)
                  matrix `a' = e(lclass_k_levels)
                  scalar `classes_alt' = `a'[1,1]
                  scalar `p' = (3 * `classes_alt' - 1)
                  scalar `q' = (3 * `classes_null' - 1)
                  scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
                  scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
                  }
                  return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
                  end
                  Then, I am running this LCA.
                  Code:
                  gsem ( res_hhincome_blgp_fa res_blgp_pct_employed res_hhvalue_blgp_fa res_pct_ba_blgp_fa res_pct_vacant_blgp_fa <-), lclass (C 2) nonrtolerance
                  And finally, I am storing my estimates and running the lmrtest.
                  Code:
                  estimates store class2
                  lmrtest class2
                  In response to the lmrtest syntax, I am getting an r(111) error: "estimates __00000C not found"

                  Can you please advise how I can deal with this issue?

                  Comment


                  • #24
                    I am doing gsem using binary variables. I want to do the LMRTEST. As suggested in the post #2 I copied the code in a file called lmrtest.ado in ado\personal folder. Still I am not getting the p-value. I am getting the output. Could anyone say where I went wrong.

                    lmrtest class2 class3
                    unexpected end of file
                    (error occurred while loading lmrtest.ado)
                    r(612);

                    Comment


                    • #25
                      Setting up the program that I specified in my earlier post requires you to write an ado file and save it in your personal ado directory (that's a directory for user-written programs). In a lot of the posts here, some trouble seems to stem from the use of the ado file setup, and I'm not really in a good position to diagnose that. I'd encourage people to manually calculate the LR statistic themselves, if that is what they want to do (note my other posts that some simulation work has cast doubt on the accuracy of this modified LR test).

                      In a likelihood ratio test, you are testing if a less restrictive model (more parameters) fits better than a more restrictive one. For example, does a 4-class solution fit better than a 3-class one? If this were a standard linear regression, you could calculate this by hand easily enough (or use the built in lrtest command). In any case, in an LR test, the value you're testing is -2* the difference in log-likelihoods between the two different models. You can obtain that from the e-class scalars returned by the gsem command after each one is fit. Normally, we know that the raw test statistic has a chi-square distribution, with degrees of freedom (df) equal to the difference in the number of parameters in each test. Here, if you're running two linear regressions and you added one parameter to the second one, df = 1.

                      In the case of LCA, Lo et al derived an analytic correction for the raw test statistic. They think you need to divide -2* the LL difference by: (1 + {(p-q)*ln(n)}^-1). This is equation 15 in their paper. I believe that p and q are defined on page 3, after equation 1. p = 3k1 - 1, where k1 appears to be the number of components in the less constrained LCA model, and q = 3k0 - 1, where k0 is the number of components in the more constrained model. For example, if you're testing 5 vs 4 classes, p = 3*5 - 1, and q = 3*4 - 1. So, p-q = 3. n is the sample size, which should be the same between models.

                      In LCA, the difference in df is a bit different. For each model, the number of parameters estimated is (number of classes - 1) + (number of classes * number of parameters). That is actually given in the scalar e(rank).

                      You then use the chi2tail function. The first argument the difference in degrees of freedom. The second argument is the modified LR test statistic.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #26
                        Hi Weiwen

                        I copied the following codes in a file called lmrtest.ado. in a folderc:\ado\personal\ . Hope it is fine.

                        program lmrtest, rclass
                        version 15.1
                        args estimates_null estimates_alternative
                        tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
                        quietly {
                        estimates restore `1'
                        scalar `n' = e(N)
                        scalar `ll_null' = e(ll)
                        scalar `parms_null' = e(k)
                        matrix `a' = e(lclass_k_levels)
                        scalar `classes_null' = `a'[1,1]

                        estimates restore `2'
                        scalar `ll_alt' = e(ll)
                        scalar `parms_alt' = e(k)
                        matrix `a' = e(lclass_k_levels)
                        scalar `classes_alt' = `a'[1,1]

                        scalar `p' = (3 * `classes_alt' - 1)
                        scalar `q' = (3 * `classes_null' - 1)
                        scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
                        scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
                        }
                        return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
                        end


                        This is not giving me probability. Even for the example 50 when I run I got the error message.

                        . lmrtest class2 class3
                        unexpected end of file
                        (error occurred while loading lmrtest.ado)
                        r(612);



                        For example 50, I ran the following codes for class 2 and 3 along with syntax. The scalar dir gave me the values for lr_test_stat and modlr_test_stat as 2.3330657 and 2.4232276. The df in display e(rank)= 14.Could you please tell me how will you test whether this is significant or not?

                        ........
                        . scalar p = (3 * classes_3 - 1)

                        . scalar q = (3 * classes_2 - 1)

                        . scalar lr_test_stat = -2 * (ll_2 - ll_3)

                        . scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)

                        .......

                        Comment


                        • #27
                          Hello Weiwen,

                          Thanks for the invaluable information here.

                          I am trying to compare models in LPA (continous variable), and obtain the LMR (AIC and BIC do not tell me whether the models are significantly different). I created an ado file with your code, but it returns an error message for estimates not found.

                          I know this is easy in Mplus, so wondering why it is so difficult in STATA :-).

                          Thanks for your help!

                          Serge

                          Comment


                          • #28
                            Hi everyone,

                            Working through Weiwen's suggestions I have found a solution that worked for me and so sharing it here in case helpful for others.

                            1. Run your models, storing the estimates e.g. after you run your two class model type:
                            Code:
                            estimates store 2classmodel
                            and after you run your three class model type:
                            Code:
                            estimates store 3classmodel
                            2. Use this code to calculate the relevant scalars:
                            Code:
                            estimates restore 2classmodel
                                    estimates store class2
                                    scalar n = e(N)
                                    scalar ll_2 = e(ll)
                                    scalar parms_2 = e(rank)
                                    scalar classes_2 = 2
                            
                             estimates restore 3classmodel
                                    scalar n = e(N)
                                    scalar ll_3 = e(ll)
                                    scalar parms_3 = e(rank)
                                    scalar classes_3 = 3 
                                    scalar p = (3 * classes_3 - 1)
                                    scalar q = (3 * classes_2 - 1)
                                    scalar lr_test_stat = -2 * (ll_2 - ll_3)
                                    scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)
                                    scalar lmrt_p = chi2tail(parms_3 - parms_2, modlr_test_stat)
                            3. Use this code to display the new scalars and obtain your LMR LT test-statistic and p-value:
                            Code:
                             scalar list
                            Note the use of e(rank) rather than e(k) as pointed out in earlier posts.

                            Also note that I altered this code from that give in Post #6 by removing the ` and ' marks around classes_3 and classes_2 as these aren't needed when referring to these scalars outside of setting up a program (i.e. in the ado file). I also added a line from the ado coding so that the p-value (lmrt_p) is also calculated, again removing unnecessary quote marks. I found that this circumvented the issue of no p-value and test-statistic being displayed.

                            When you type scalar list all of the scalars you have just made are displayed and with this workaround you should now get the LMR LT test-statistic and p-value.

                            Hope this was helpful!

                            Comment


                            • #29
                              Originally posted by Laura Brown View Post
                              Hi everyone,

                              Working through Weiwen's suggestions I have found a solution that worked for me and so sharing it here in case helpful for others.

                              1. Run your models, storing the estimates e.g. after you run your two class model type:
                              Code:
                              estimates store 2classmodel
                              and after you run your three class model type:
                              Code:
                              estimates store 3classmodel
                              2. Use this code to calculate the relevant scalars:
                              Code:
                              estimates restore 2classmodel
                              estimates store class2
                              scalar n = e(N)
                              scalar ll_2 = e(ll)
                              scalar parms_2 = e(rank)
                              scalar classes_2 = 2
                              
                              estimates restore 3classmodel
                              scalar n = e(N)
                              scalar ll_3 = e(ll)
                              scalar parms_3 = e(rank)
                              scalar classes_3 = 3
                              scalar p = (3 * classes_3 - 1)
                              scalar q = (3 * classes_2 - 1)
                              scalar lr_test_stat = -2 * (ll_2 - ll_3)
                              scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)
                              scalar lmrt_p = chi2tail(parms_3 - parms_2, modlr_test_stat)
                              3. Use this code to display the new scalars and obtain your LMR LT test-statistic and p-value:
                              Code:
                               scalar list
                              Note the use of e(rank) rather than e(k) as pointed out in earlier posts.

                              Also note that I altered this code from that give in Post #6 by removing the ` and ' marks around classes_3 and classes_2 as these aren't needed when referring to these scalars outside of setting up a program (i.e. in the ado file). I also added a line from the ado coding so that the p-value (lmrt_p) is also calculated, again removing unnecessary quote marks. I found that this circumvented the issue of no p-value and test-statistic being displayed.

                              When you type scalar list all of the scalars you have just made are displayed and with this workaround you should now get the LMR LT test-statistic and p-value.

                              Hope this was helpful!
                              Just to add that I had to tweak this again slightly as I wasn't always getting my LMR LT test statistic and p-value out. I noticed that I had erroneously calculated n for the null then alt, so the alt n overrode that stored for the null. I also found that I had to manually input the p q and n into the scalar modlr_test_stat equation for it to work sometimes. Here's the updated code I've had to use (this time with a 2classmodel vs 1classmodel example (where my p came out as 2, my q as 5 and my n as 923):

                              Code:
                              *Drop any previously saved scalars:
                                      scalar drop _all
                                  
                                      estimates restore 1classmodel
                                      scalar n = e(N)
                                      scalar ll_1 = e(ll)
                                      scalar parms_1 = e(rank)
                                      scalar classes_1 = 1 
                                      
                                      estimates restore 2classmodel
                                      scalar ll_2 = e(ll)
                                      scalar parms_2 = e(rank)
                                      scalar classes_2 = 2 
                                      
                                      scalar p = (3 * classes_2 - 1)
                                      scalar q = (3 * classes_1 - 1)
                                      scalar lr_test_stat = -2 * (ll_1 - ll_2)
                                      *show scalars calculated so far:
                                          scalar list 
                                      
                                      *scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)
                                          *Substitute values of p, q and n:
                                              scalar modlr_test_stat = lr_test_stat / (1 + ((2 - 5) * ln(923)) ^ -1)
                                      
                                      scalar lmrt_p = chi2tail(parms_2 - parms_1, modlr_test_stat)
                                      
                                      *Show all scalars:
                                          scalar list

                              Comment


                              • #30
                                Originally posted by Laura Brown View Post

                                Just to add that I had to tweak this again slightly as I wasn't always getting my LMR LT test statistic and p-value out. I noticed that I had erroneously calculated n for the null then alt, so the alt n overrode that stored for the null. I also found that I had to manually input the p q and n into the scalar modlr_test_stat equation for it to work sometimes. Here's the updated code I've had to use (this time with a 2classmodel vs 1classmodel example (where my p came out as 2, my q as 5 and my n as 923):

                                Code:
                                *Drop any previously saved scalars:
                                scalar drop _all
                                
                                estimates restore 1classmodel
                                scalar n = e(N)
                                scalar ll_1 = e(ll)
                                scalar parms_1 = e(rank)
                                scalar classes_1 = 1
                                
                                estimates restore 2classmodel
                                scalar ll_2 = e(ll)
                                scalar parms_2 = e(rank)
                                scalar classes_2 = 2
                                
                                scalar p = (3 * classes_2 - 1)
                                scalar q = (3 * classes_1 - 1)
                                scalar lr_test_stat = -2 * (ll_1 - ll_2)
                                *show scalars calculated so far:
                                scalar list
                                
                                *scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)
                                *Substitute values of p, q and n:
                                scalar modlr_test_stat = lr_test_stat / (1 + ((2 - 5) * ln(923)) ^ -1)
                                
                                scalar lmrt_p = chi2tail(parms_2 - parms_1, modlr_test_stat)
                                
                                *Show all scalars:
                                scalar list

                                you mean that if I need to save those code as ado.file?
                                I can't get any value out of it.
                                and also, If It's ado.file code and I chose to compare class4 vs class3, does it change numbers automatically?
                                or should I put 3 and 4 on those code?
                                I'm lost in all the codes..........help me please.

                                Comment

                                Working...
                                X