Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lo–Mendell–Rubin Adjusted Likelihood Ratio Test and wald test

    Hello everyone,

    Can you, please, advice me how to conduct Lo–Mendell–Rubin Adjusted Likelihood Ratio Test in latent profile analysis (LPA) to compare a k model with the preceding one and Wald test to compare the means in the identified profile classes (to see the difference between identified profile classes?

    Appreciate you comments

  • #2
    For other readers' information, in latent class and latent profile models, we are interested in testing which number of classes fit the data the best. Stata provides the BIC. This is fine, but my understanding is that there is still a lot of theoretical debate on how to test for the appropriate number of classes. You might think, why not use a likelihood ratio test? People far smarter than I have shown that an LR test for k vs k-1 class does not have a chi-2 distributed test statistic because of something about boundary of the parameter space blah blah blah.

    The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). In another source I cannot recall, I think someone challenged the theoretical work, but Nylund, Asparouhov, and Bength Muthen (2007) reviewed the literature and performed a simulation study, and showed that the LMR test was adequate (but the bootstrapped LR test performed better - only thing is, Stata cannot do the BLRT - will explain later).

    Lo et al's paper (my first link) is very dense, but after reading that and a few other sources, I believe that their equation 15 describes the adjustment to the LR test statistic. The terms p and q in their equation are described on page 769, under equation 1.

    If that reading is correct, then I believe that this ado file I wrote for personal use will calculate the LMR test statistic. I hope the community will point out any errors if they notice them! Liuwork, you will have to save this in your personal ado file directory (on my machine, C:\ado\personal - this will vary depending on where you installed Stata).

    Code:
    program lmrtest, rclass
    version 15.0
    args estimates_null estimates_alternative
    tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
    quietly {
    estimates restore `1'
    scalar `n' = e(N)
    scalar `ll_null' = e(ll)
    scalar `parms_null' = e(k)
    matrix `a' = e(lclass_k_levels)
    scalar `classes_null' = `a'[1,1]
    
    estimates restore `2'
    scalar `ll_alt' = e(ll)
    scalar `parms_alt' = e(k)
    matrix `a' = e(lclass_k_levels)
    scalar `classes_alt' = `a'[1,1]
    
    scalar `p' = (3 * `classes_alt' - 1)
    scalar `q' = (3 * `classes_null' - 1)
    scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
    scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
    }
    return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
    end
    The bootstrap LR test cannot be done by Stata, because Stata does not permit bootstrapping in gsem with categorical latent variables. Moreover, you could run into a situation where some bootstrap runs fail to converge; what do you do then? I'm not sure how other programs handle that situation. Because latent class models have frequent convergence problems, especially with higher numbers of classes, this is going to happen. I've expressed to Stata that I would like them to implement the BLRT, because it seems like it is the test that is considered most highly by practitioners, but it is not trivial to implement, and the BIC is adequate - the Nylund et al paper said that "we conclude that the BIC is superior to all other ICs", although more so for latent profile analyses than latent class analyses. Of course, the whole paper is based on simulation data, in which you simulate a known number of classes with known data generating processes ... reality is messier.
    Last edited by Weiwen Ng; 11 Apr 2018, 07:53. Reason: Formatting.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Part 2 of your question should be fairly simple. You want to test if the means for each indicator among the latent classes are equal. I think this can be done using the -test- command. One minor complication is that you have to run -estat lcmean, post-, and importantly, you have to tell it to post the variance-covariance matrix. I have not seen people do this, but in principle, this is how I would do it (and if my approach is wrong, I'd love to hear it).

      Code:
      use http://www.stata-press.com/data/r15/gsem_lca1
      gsem (accident play insurance stock <- ), logit lclass(C 2)
      estat lcmean, post
      test [1]accident = [2]accident
      You can extend this to more than 2 classes and simultaneously test for equality. I believe that code is:

      Code:
      use http://www.stata-press.com/data/r15/gsem_lca1
      gsem (accident play insurance stock <- ), logit lclass(C 3)
      estat lcmean, post
      test [1]accident = [2]accident, notest accum
      test [2]accident = [3]accident, accum
      When I ran that code and I inspected the confidence intervals of the estat lcmeans output for the 3-class model ... they looked very, very strange compared to the 2-class model. Meaning the CIs run from 0 to 1. That is in very stark contrast to the 2 class model. I am running Stata 15.1, revised April 2, 2018, 64 bits for Windows. I replicated this behavior under version 15.0. If anyone gets different results from me for the 3-class model, please comment. If this is the wrong approach, I'd love to hear it as well.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Originally posted by Weiwen Ng View Post
        ...
        The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). ....
        This is the correct link to Lo et al. Apologies for the bad link.

        Readers should also note that in Nylund et al's simulation study, the LMR LR test had type I error rates around 0.2, compared to the BLRT which had type I error rates around 0.6 (albeit they tested sample sizes of 200, 500, and 1,000; your situation may differ). I know that simulation data aren't real data, but I'd still rely more on the BIC, and we already get that in Stata.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thank you Weiwen for your response. Very useful. I have tried the syntax you provided me to calculate Lo–Mendell–Rubin Adjusted Likelihood Ratio Test, it doesn't show me error message but not giving me the test result I am not sure why.

          Any comment, thanks

          Comment


          • #6
            Originally posted by Liyuwork Dana View Post
            Thank you Weiwen for your response. Very useful. I have tried the syntax you provided me to calculate Lo–Mendell–Rubin Adjusted Likelihood Ratio Test, it doesn't show me error message but not giving me the test result I am not sure why.

            Any comment, thanks
            Oh dear, this is a very elementary error on my part. I wrote this command for my own use, but it presumes you had saved this file as an ado file, and stored the estimates from the k and k-1 class model using -estimates store-, and finally typed -lmrtest- in the command line. In assuming that the operations would be obvious to everyone, I committed a basic theory of mind error. My apologies for the lack of clarity!

            I would suggest you do the following.

            Code:
            gsem (x1 x2 x3 <- , logit), lclass(C 2)
            estimates store class2
            scalar n = e(N)
            scalar ll_2 = e(ll)
            scalar parms_2 = e(k)
            scalar classes_2 = 2
            
            gsem (x1 x2 x3 <- , logit), lclass(C 3)
            estimates store class3
            scalar n = e(N)
            scalar ll_3 = e(ll)
            scalar parms_3 = e(k)
            scalar classes_3 = 3 
             scalar p = (3 * `classes_3' - 1) scalar q = (3 * `classes_2' - 1) scalar lr_test_stat = -2 * (ll_2 - ll_3) scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) scalar list modlr_test_stat
            This requires more manual work but it will get you your LMR test p-values. If you want to automate the process, then I would copy my original block of code to a do file and save it as an ado called lmrtest.ado in your personal ado file directory.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Thank you Weiwen.

              This requires more manual work but it will get you your LMR test p-values. If you want to automate the process, then I would copy my original block of code to a do file and save it as an ado called lmrtest.ado in your personal ado file directory.[/QUOTE]

              I would appreciate if you copy your original block. I am not such familiar person on LPA. Many thanks once again.


              Comment


              • #8
                Originally posted by Liyuwork Dana View Post
                Thank you Weiwen.

                I would appreciate if you copy your original block. I am not such familiar person on LPA. Many thanks once again.

                Take the code I posted in post #2. Save it as lmrtest.ado in your personal ado directory, which is probably c:\ado\personal on a Windows machine.

                Then, you'd use the code like this:

                Code:
                 
                 gsem (x1 x2 x3 <- , logit), lclass(C 2) estimates store class2  gsem (x1 x2 x3 <- , logit), lclass(C 3) estimates store class3  lmrtest class2 class3
                Remember, the sequence of arguments is null model, then alternate model. The null model is the one with fewer parameters (i.e. it is the more parsimonious model, and the alternate model is nested within it, just like a regular likelihood ratio test). I did not write a help file for this because I wrote it for my personal use. However, once it's saved in your personal ado directory, Stata will treat it like a regular estimation command.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  Code should read as follows.

                  Code:
                  gsem (x1 x2 x3 <- , logit), lclass(C 2)
                  estimates store class2
                  gsem (x1 x2 x3 <- , logit), lclass(C 3)
                  estimates store class3
                  lmrtest class2 class3
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment


                  • #10
                    Thank you indeed Weiwen.

                    Comment


                    • #11
                      Hi Scholars,

                      syntax on Lo–Mendell–Rubin Adjusted Likelihood Ratio Test

                      Can someone comment me what is wrong with my today's syntax? After I copied the syntax from post #2 and saved as -lmrtest- (like this: save "C:\ado\Personal\lmrtest.ado") then used the following syntax: .

                      gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 2)
                      estimates store class2
                      gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 3)
                      estimates store class3
                      lmrtest class2 class3

                      the Stata is not resulting the p-value but not also showing me" invalid syntax" message.

                      Please, help me on this.
                      Many thanks

                      Comment


                      • #12
                        Originally posted by Liyuwork Dana View Post
                        Hi Scholars,

                        syntax on Lo–Mendell–Rubin Adjusted Likelihood Ratio Test

                        Can someone comment me what is wrong with my today's syntax? After I copied the syntax from post #2 and saved as -lmrtest- (like this: save "C:\ado\Personal\lmrtest.ado") then used the following syntax: .

                        gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 2)
                        estimates store class2
                        gsem (q6_1 q6_2 q6_3 q6_4 <- _cons), family(gaussian) link(identity) lclass(A 3)
                        estimates store class3
                        lmrtest class2 class3

                        the Stata is not resulting the p-value but not also showing me" invalid syntax" message.

                        Please, help me on this.
                        Many thanks
                        I wrote the code above to return the p-value as a scalar that can be exported to regression results. The command, as written, doesn't display anything, which I forgot to mention - I did say this was written for my own use! If you want to display the scalar, you can type:

                        Code:
                        scalar list lmrt_p
                        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                        Comment


                        • #13
                          OMG, now I give up with this STATA staff and calculating Lo–Mendell–Rubin Adjusted Likelihood Ratio Test. So annoying using STATA for LPA. Anyways, I appreciated your help Weiwen Ng.

                          Comment


                          • #14
                            Originally posted by Weiwen Ng View Post
                            For other readers' information, in latent class and latent profile models, we are interested in testing which number of classes fit the data the best. Stata provides the BIC. This is fine, but my understanding is that there is still a lot of theoretical debate on how to test for the appropriate number of classes. You might think, why not use a likelihood ratio test? People far smarter than I have shown that an LR test for k vs k-1 class does not have a chi-2 distributed test statistic because of something about boundary of the parameter space blah blah blah.

                            The Lo-Mendell-Rubin LR test is an analytical correction to the test statistic derived by Yungtai Lo, Nancy Mendell, and Donald Rubin, and published in Biometrika (2001). In another source I cannot recall, I think someone challenged the theoretical work, but Nylund, Asparouhov, and Bength Muthen (2007) reviewed the literature and performed a simulation study, and showed that the LMR test was adequate (but the bootstrapped LR test performed better - only thing is, Stata cannot do the BLRT - will explain later).

                            Lo et al's paper (my first link) is very dense, but after reading that and a few other sources, I believe that their equation 15 describes the adjustment to the LR test statistic. The terms p and q in their equation are described on page 769, under equation 1.

                            If that reading is correct, then I believe that this ado file I wrote for personal use will calculate the LMR test statistic. I hope the community will point out any errors if they notice them! Liuwork, you will have to save this in your personal ado file directory (on my machine, C:\ado\personal - this will vary depending on where you installed Stata).

                            Code:
                            program lmrtest, rclass
                            version 15.0
                            args estimates_null estimates_alternative
                            tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
                            quietly {
                            estimates restore `1'
                            scalar `n' = e(N)
                            scalar `ll_null' = e(ll)
                            scalar `parms_null' = e(k)
                            matrix `a' = e(lclass_k_levels)
                            scalar `classes_null' = `a'[1,1]
                            
                            estimates restore `2'
                            scalar `ll_alt' = e(ll)
                            scalar `parms_alt' = e(k)
                            matrix `a' = e(lclass_k_levels)
                            scalar `classes_alt' = `a'[1,1]
                            
                            scalar `p' = (3 * `classes_alt' - 1)
                            scalar `q' = (3 * `classes_null' - 1)
                            scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
                            scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
                            }
                            return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
                            end
                            The bootstrap LR test cannot be done by Stata, because Stata does not permit bootstrapping in gsem with categorical latent variables. Moreover, you could run into a situation where some bootstrap runs fail to converge; what do you do then? I'm not sure how other programs handle that situation. Because latent class models have frequent convergence problems, especially with higher numbers of classes, this is going to happen. I've expressed to Stata that I would like them to implement the BLRT, because it seems like it is the test that is considered most highly by practitioners, but it is not trivial to implement, and the BIC is adequate - the Nylund et al paper said that "we conclude that the BIC is superior to all other ICs", although more so for latent profile analyses than latent class analyses. Of course, the whole paper is based on simulation data, in which you simulate a known number of classes with known data generating processes ... reality is messier.
                            I am not able to understand what parms_null and parms_alt contain. I ran it on my system and it's not df which in my 3 and 4 class study turn out as 18 and 23. Number of observed variables is 4 in my case. params_null and params_alt are 27 and 36 respectively for me. params in general is 9*number of classes.
                            I am able to compute results thanks to you but without understanding it, I won't be able to submit my findings.
                            Btw thanks a lot for your code and help.

                            Comment


                            • #15
                              Originally posted by Harshit Aneja View Post

                              I am not able to understand what parms_null and parms_alt contain. I ran it on my system and it's not df which in my 3 and 4 class study turn out as 18 and 23. Number of observed variables is 4 in my case. params_null and params_alt are 27 and 36 respectively for me. params in general is 9*number of classes.
                              I am able to compute results thanks to you but without understanding it, I won't be able to submit my findings.
                              Btw thanks a lot for your code and help.
                              Harshit, params_null is the number of parameters in the null model, and params_alt is the number of parameters in the alternative (i.e. less parsimonious) model. In a regular LR test, these would be the difference in degrees of freedom in each model.

                              Now, you do raise an interesting point. I assigned the number of parameters as e(k), which is returned by gsem. That's the number of parameters in the model.

                              However, in Stata's gsem example data from example 50, which I used in my sample code above, one parameter is always zero - that's the parameter concerning the base class in the multinomial logit model for the classes. It is zero because it has to be.

                              Maybe I misinterpreted what the number of parameters means! I do apologize if this is my misunderstanding. I believe the degrees of freedom should correspond to the scalar e(rank). That's the rank of the variance-covariance matrix, which is returned as e(V). Can you check if e(rank) corresponds to your reported degrees of freedom? After running a -gsem- command, you can do that by typing:

                              Code:
                              display e(rank)
                              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                              Comment

                              Working...
                              X