Lo–Mendell–Rubin Adjusted Likelihood Ratio Test and wald test

Harshit Aneja

Join Date: Jul 2018

Posts: 2
#16

01 Jul 2018, 12:47

Originally posted by Weiwen Ng View Post

Harshit, params_null is the number of parameters in the null model, and params_alt is the number of parameters in the alternative (i.e. less parsimonious) model. In a regular LR test, these would be the difference in degrees of freedom in each model.

Now, you do raise an interesting point. I assigned the number of parameters as e(k), which is returned by gsem. That's the number of parameters in the model.

However, in Stata's gsem example data from example 50, which I used in my sample code above, one parameter is always zero - that's the parameter concerning the base class in the multinomial logit model for the classes. It is zero because it has to be.

Maybe I misinterpreted what the number of parameters means! I do apologize if this is my misunderstanding. I believe the degrees of freedom should correspond to the scalar e(rank). That's the rank of the variance-covariance matrix, which is returned as e(V). Can you check if e(rank) corresponds to your reported degrees of freedom? After running a -gsem- command, you can do that by typing:

Code:

display e(rank)

Yes. Thank you. e(rank) is exactly what I had in mind. Thanks a lot.
Comment
Wossenseged Jemberie

Join Date: Aug 2017

Posts: 17
#17

15 Dec 2018, 18:03

Hi Wiewen,
I had used your code on post #2 as my personal ado file. Afterwards, I performed a two class and threeclass analysis, storing the estimates after each -gsem- model specification. However the code:

Code:

lmrtest class2 class3

did not give any result/error. Also I typed

Code:

scalar list lmrt_p

afterwards. However Stata gave me the error message:

Code:

scalar lmrt_p not found r(111);

I used the stata file from https://stats.idre.ucla.edu/wp-conte...16/02/lca1.dat. the sample code is as follows.

Code:

import delimited using lca-mplus.txt //saved text file from the website gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 2) estimates store class2 gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 3) estimates store class3 lmrtest class2 class3 scalar list lmrt_p

Can you point me out where it went wrong?
LMRT test with Mplus gives a value of 38.468 and p-value of 0.1500.
thank you very much
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#18

17 Dec 2018, 10:00

Originally posted by Wossenseged Jemberie View Post

Hi Wiewen,
I had used your code on post #2 as my personal ado file. Afterwards, I performed a two class and threeclass analysis, storing the estimates after each -gsem- model specification. However the code:

Code:

lmrtest class2 class3

did not give any result/error. Also I typed

Code:

scalar list lmrt_p

afterwards. However Stata gave me the error message:

Code:

scalar lmrt_p not found r(111);

I used the stata file from https://stats.idre.ucla.edu/wp-conte...16/02/lca1.dat. the sample code is as follows.

Code:

import delimited using lca-mplus.txt //saved text file from the website gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 2) estimates store class2 gsem ( v2 v3 v4 v5 v6 v7 v8 v9 v10 <-, logit ), lclass(C 3) estimates store class3 lmrtest class2 class3 scalar list lmrt_p

Can you point me out where it went wrong?
LMRT test with Mplus gives a value of 38.468 and p-value of 0.1500.
thank you very much

Wossenseged,

After you type the lmrtest command, can you type

Code:

scalar dir

That will show all scalars in memory. If you had copied the code but renamed the scalar, that would catch it (or if I, in my posting to the forum, named it something else).

You should note what I said to Liyuwork (the original poster) earlier: the LMR test looks like it has a high false positive rate compared to other tests when the latent class structure is complex. I fear that "complex" actually describes a lot of real-world cases - it can include cases when one class is small, and/or when the indicators cross-load or don't separate the classes strongly. You should be entirely justified in using just the BIC.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Wossenseged Jemberie

Join Date: Aug 2017

Posts: 17
#19

18 Dec 2018, 04:58

Originally posted by Weiwen Ng View Post

Wossenseged,

After you type the lmrtest command, can you type

Code:

scalar dir

That will show all scalars in memory. If you had copied the code but renamed the scalar, that would catch it (or if I, in my posting to the forum, named it something else).

You should note what I said to Liyuwork (the original poster) earlier: the LMR test looks like it has a high false positive rate compared to other tests when the latent class structure is complex. I fear that "complex" actually describes a lot of real-world cases - it can include cases when one class is small, and/or when the indicators cross-load or don't separate the classes strongly. You should be entirely justified in using just the BIC.

thanks, Weiwen for your reply. I typed

Code:

scalar dir

, however this did not display any number. I have not tweaked the code from your post on #2. I am actually comparing the results from Stata and Mplus for this simple example.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#20

18 Dec 2018, 14:44

Originally posted by Wossenseged Jemberie View Post

thanks, Weiwen for your reply. I typed

Code:

scalar dir

, however this did not display any number. I have not tweaked the code from your post on #2. I am actually comparing the results from Stata and Mplus for this simple example.

This could be an issue with how/where you stored the ado file. Had you tried manually calculating the LR test statistic using post 6, with the correction in post 15?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Wossenseged Jemberie

Join Date: Aug 2017

Posts: 17
#21

19 Dec 2018, 07:55

Originally posted by Weiwen Ng View Post

This could be an issue with how/where you stored the ado file. Had you tried manually calculating the LR test statistic using post 6, with the correction in post 15?

I think I have stored the ado file in my personal folder.

Code:

personal dir

shows me that I have the file there.

Code:

your personal ado-directory is c:\ado\personal\ <dir> . <dir> .. 0.8k lmrtest.ado

I dont know why I cannot see the scalars with the lmrtest ado in #2. However,

I had also done the calculation manually according to the code in #6 and again by changing the e(k) to e(rank) for the parameters. But I think the problem is with the results returned for p and q.

Code:

. scalar dir modlr_test_stat = . lr_test_stat = 39.024763 q = -3 p = -3 classes_3 = 3 parms_3 = 29 //e(rank), e(k) returns 30 ll_3 = -4231.6958 classes_2 = 2 parms_2 = 19 //e(rank), e(k) returns 20 ll_2 = -4251.2082 n = 1000

As p and q are both -3, modlr_test_stat will return infinity (missing). i.e 0 ^-1 for the denominator. the codes from #6 are

Code:

scalar p = (3 * `classes_3' - 1) scalar q = (3 * `classes_2' - 1) scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)

So if I do the automatic lmrtest as posted in #2
the result from

Code:

return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')

does not show me anything. Do you think that might be the problem?
Comment
Liyuwork Dana

Join Date: Apr 2018

Posts: 27
#22

23 Jan 2019, 01:07

Hi everyone,

Can you please help me what is wrong with the syntax I am using (below) to calculate chi-squares and p-values comparing classes in LPA.
I tried to follow the syntax given by #2 in this post to calculate "Lo–Mendell–Rubin Adjusted Likelihood Ratio Test" .

I have got the chi-square distribution but couldn't find the p-value

quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 4)
estimates store class4
scalar n = e(N)
scalar ll_4 = e(ll)
scalar parms_4 = e(k)
matrix a = e(lclass_k_levels)
scalar clasess_4quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 4)
estimates store class4
scalar n = e(N)
scalar ll_4 = e(ll)
scalar parms_4 = e(k)
matrix a = e(lclass_k_levels)
scalar clasess_4 = 4

quietly gsem (W, X, Y, Z <- _cons), family(gaussian) link(identity) lclass(A 5)
estimates store class5
scalar n = e(N)
scalar ll_5 = e(ll)
scalar parms_5 = e(k)
matrix a = e(lclass_k_levels)
scalar clasess_5 = 5
scalar p = (5 * clasess_5 - 1)
scalar q = (5 * clasess_4 - 1)
scalar lr_test_stat = -2 * (ll_4 - ll_5)
scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n))^ -1)
scalar list modlr_test_stat
return scalar lmrt_p = chi2tail(parms_5 - parms_4,modlr_test_stat) = 4

the other point is that I have found negative chi-square test distribution which is not correct comparing class 5 and class6, can you give me some feedback why X2 is coming -ve and how can I correct it?

Thank you for you help in advance,
Liyu
Comment

Jeremy Singer

Join Date: Mar 2019
Posts: 1

#23

06 Mar 2019, 08:06

Hello! I am conducting an LCA and I'm seeking to run the lmrtest afterwards.

First, I'm following the code from #2 to program the lmr test.

Code:

program lmrtest, rclass
version 15.0
args estimates_null estimates_alternative
tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
quietly {
estimates restore `1'
scalar `n' = e(N)
scalar `ll_null' = e(ll)
scalar `parms_null' = e(k)
matrix `a' = e(lclass_k_levels)
scalar `classes_null' = `a'[1,1]
estimates restore `2'
scalar `ll_alt' = e(ll)
scalar `parms_alt' = e(k)
matrix `a' = e(lclass_k_levels)
scalar `classes_alt' = `a'[1,1]
scalar `p' = (3 * `classes_alt' - 1)
scalar `q' = (3 * `classes_null' - 1)
scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
}
return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
end

Then, I am running this LCA.

Code:

gsem ( res_hhincome_blgp_fa res_blgp_pct_employed res_hhvalue_blgp_fa res_pct_ba_blgp_fa res_pct_vacant_blgp_fa <-), lclass (C 2) nonrtolerance

And finally, I am storing my estimates and running the lmrtest.

Code:

estimates store class2
lmrtest class2

In response to the lmrtest syntax, I am getting an r(111) error: "estimates __00000C not found"

Can you please advise how I can deal with this issue?

Comment

AJ Francis Zavier

Join Date: Apr 2018

Posts: 2
#24

08 Apr 2020, 22:36

I am doing gsem using binary variables. I want to do the LMRTEST. As suggested in the post #2 I copied the code in a file called lmrtest.ado in ado\personal folder. Still I am not getting the p-value. I am getting the output. Could anyone say where I went wrong.

lmrtest class2 class3
unexpected end of file
(error occurred while loading lmrtest.ado)
r(612);
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#25

09 Apr 2020, 10:57

Setting up the program that I specified in my earlier post requires you to write an ado file and save it in your personal ado directory (that's a directory for user-written programs). In a lot of the posts here, some trouble seems to stem from the use of the ado file setup, and I'm not really in a good position to diagnose that. I'd encourage people to manually calculate the LR statistic themselves, if that is what they want to do (note my other posts that some simulation work has cast doubt on the accuracy of this modified LR test).

In a likelihood ratio test, you are testing if a less restrictive model (more parameters) fits better than a more restrictive one. For example, does a 4-class solution fit better than a 3-class one? If this were a standard linear regression, you could calculate this by hand easily enough (or use the built in lrtest command). In any case, in an LR test, the value you're testing is -2* the difference in log-likelihoods between the two different models. You can obtain that from the e-class scalars returned by the gsem command after each one is fit. Normally, we know that the raw test statistic has a chi-square distribution, with degrees of freedom (df) equal to the difference in the number of parameters in each test. Here, if you're running two linear regressions and you added one parameter to the second one, df = 1.

In the case of LCA, Lo et al derived an analytic correction for the raw test statistic. They think you need to divide -2* the LL difference by: (1 + {(p-q)*ln(n)}^-1). This is equation 15 in their paper. I believe that p and q are defined on page 3, after equation 1. p = 3k1 - 1, where k1 appears to be the number of components in the less constrained LCA model, and q = 3k0 - 1, where k0 is the number of components in the more constrained model. For example, if you're testing 5 vs 4 classes, p = 3*5 - 1, and q = 3*4 - 1. So, p-q = 3. n is the sample size, which should be the same between models.

In LCA, the difference in df is a bit different. For each model, the number of parameters estimated is (number of classes - 1) + (number of classes * number of parameters). That is actually given in the scalar e(rank).

You then use the chi2tail function. The first argument the difference in degrees of freedom. The second argument is the modified LR test statistic.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
AJ Francis Zavier

Join Date: Apr 2018

Posts: 2
#26

10 Apr 2020, 20:55

Hi Weiwen

I copied the following codes in a file called lmrtest.ado. in a folderc:\ado\personal\ . Hope it is fine.

program lmrtest, rclass
version 15.1
args estimates_null estimates_alternative
tempname n ll_null ll_alt parms_null parms_alt a classes_null classes_alt lr_test_stat modlr_test_stat p q
quietly {
estimates restore `1'
scalar `n' = e(N)
scalar `ll_null' = e(ll)
scalar `parms_null' = e(k)
matrix `a' = e(lclass_k_levels)
scalar `classes_null' = `a'[1,1]

estimates restore `2'
scalar `ll_alt' = e(ll)
scalar `parms_alt' = e(k)
matrix `a' = e(lclass_k_levels)
scalar `classes_alt' = `a'[1,1]

scalar `p' = (3 * `classes_alt' - 1)
scalar `q' = (3 * `classes_null' - 1)
scalar `lr_test_stat' = -2 * (`ll_null' - `ll_alt')
scalar `modlr_test_stat' = `lr_test_stat' / (1 + ((`p' - `q') * ln(`n')) ^ -1)
}
return scalar lmrt_p = chi2tail(`parms_alt' - `parms_null',`modlr_test_stat')
end

This is not giving me probability. Even for the example 50 when I run I got the error message.

. lmrtest class2 class3
unexpected end of file
(error occurred while loading lmrtest.ado)
r(612);

For example 50, I ran the following codes for class 2 and 3 along with syntax. The scalar dir gave me the values for lr_test_stat and modlr_test_stat as 2.3330657 and 2.4232276. The df in display e(rank)= 14.Could you please tell me how will you test whether this is significant or not?

........
. scalar p = (3 * classes_3 - 1)

. scalar q = (3 * classes_2 - 1)

. scalar lr_test_stat = -2 * (ll_2 - ll_3)

. scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1)

.......
Comment
Serge Damottaveiga

Join Date: Apr 2020

Posts: 3
#27

19 Apr 2020, 09:45

Hello Weiwen,

Thanks for the invaluable information here.

I am trying to compare models in LPA (continous variable), and obtain the LMR (AIC and BIC do not tell me whether the models are significantly different). I created an ado file with your code, but it returns an error message for estimates not found.

I know this is easy in Mplus, so wondering why it is so difficult in STATA :-).

Thanks for your help!

Serge
Comment
Laura Brown

Join Date: Sep 2018

Posts: 8
#28

24 Jun 2020, 07:05

Hi everyone,

Working through Weiwen's suggestions I have found a solution that worked for me and so sharing it here in case helpful for others.

1. Run your models, storing the estimates e.g. after you run your two class model type:

Code:

estimates store 2classmodel

and after you run your three class model type:

Code:

estimates store 3classmodel

2. Use this code to calculate the relevant scalars:

Code:

estimates restore 2classmodel estimates store class2 scalar n = e(N) scalar ll_2 = e(ll) scalar parms_2 = e(rank) scalar classes_2 = 2 estimates restore 3classmodel scalar n = e(N) scalar ll_3 = e(ll) scalar parms_3 = e(rank) scalar classes_3 = 3 scalar p = (3 * classes_3 - 1) scalar q = (3 * classes_2 - 1) scalar lr_test_stat = -2 * (ll_2 - ll_3) scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) scalar lmrt_p = chi2tail(parms_3 - parms_2, modlr_test_stat)

3. Use this code to display the new scalars and obtain your LMR LT test-statistic and p-value:

Code:

scalar list

Note the use of e(rank) rather than e(k) as pointed out in earlier posts.

Also note that I altered this code from that give in Post #6 by removing the ` and ' marks around classes_3 and classes_2 as these aren't needed when referring to these scalars outside of setting up a program (i.e. in the ado file). I also added a line from the ado coding so that the p-value (lmrt_p) is also calculated, again removing unnecessary quote marks. I found that this circumvented the issue of no p-value and test-statistic being displayed.

When you type scalar list all of the scalars you have just made are displayed and with this workaround you should now get the LMR LT test-statistic and p-value.

Hope this was helpful!
1 like
Comment
Laura Brown

Join Date: Sep 2018

Posts: 8
#29

29 Jun 2020, 07:26

Originally posted by Laura Brown View Post

Hi everyone,

Working through Weiwen's suggestions I have found a solution that worked for me and so sharing it here in case helpful for others.

1. Run your models, storing the estimates e.g. after you run your two class model type:

Code:

estimates store 2classmodel

and after you run your three class model type:

Code:

estimates store 3classmodel

2. Use this code to calculate the relevant scalars:

Code:

estimates restore 2classmodel estimates store class2 scalar n = e(N) scalar ll_2 = e(ll) scalar parms_2 = e(rank) scalar classes_2 = 2 estimates restore 3classmodel scalar n = e(N) scalar ll_3 = e(ll) scalar parms_3 = e(rank) scalar classes_3 = 3 scalar p = (3 * classes_3 - 1) scalar q = (3 * classes_2 - 1) scalar lr_test_stat = -2 * (ll_2 - ll_3) scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) scalar lmrt_p = chi2tail(parms_3 - parms_2, modlr_test_stat)

3. Use this code to display the new scalars and obtain your LMR LT test-statistic and p-value:

Code:

scalar list

Note the use of e(rank) rather than e(k) as pointed out in earlier posts.

Also note that I altered this code from that give in Post #6 by removing the ` and ' marks around classes_3 and classes_2 as these aren't needed when referring to these scalars outside of setting up a program (i.e. in the ado file). I also added a line from the ado coding so that the p-value (lmrt_p) is also calculated, again removing unnecessary quote marks. I found that this circumvented the issue of no p-value and test-statistic being displayed.

When you type scalar list all of the scalars you have just made are displayed and with this workaround you should now get the LMR LT test-statistic and p-value.

Hope this was helpful!

Just to add that I had to tweak this again slightly as I wasn't always getting my LMR LT test statistic and p-value out. I noticed that I had erroneously calculated n for the null then alt, so the alt n overrode that stored for the null. I also found that I had to manually input the p q and n into the scalar modlr_test_stat equation for it to work sometimes. Here's the updated code I've had to use (this time with a 2classmodel vs 1classmodel example (where my p came out as 2, my q as 5 and my n as 923):

Code:

*Drop any previously saved scalars: scalar drop _all estimates restore 1classmodel scalar n = e(N) scalar ll_1 = e(ll) scalar parms_1 = e(rank) scalar classes_1 = 1 estimates restore 2classmodel scalar ll_2 = e(ll) scalar parms_2 = e(rank) scalar classes_2 = 2 scalar p = (3 * classes_2 - 1) scalar q = (3 * classes_1 - 1) scalar lr_test_stat = -2 * (ll_1 - ll_2) *show scalars calculated so far: scalar list *scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) *Substitute values of p, q and n: scalar modlr_test_stat = lr_test_stat / (1 + ((2 - 5) * ln(923)) ^ -1) scalar lmrt_p = chi2tail(parms_2 - parms_1, modlr_test_stat) *Show all scalars: scalar list
1 like
Comment
hyunjin Kim

Join Date: Nov 2023

Posts: 1
#30

02 Nov 2023, 22:18

Originally posted by Laura Brown View Post

Just to add that I had to tweak this again slightly as I wasn't always getting my LMR LT test statistic and p-value out. I noticed that I had erroneously calculated n for the null then alt, so the alt n overrode that stored for the null. I also found that I had to manually input the p q and n into the scalar modlr_test_stat equation for it to work sometimes. Here's the updated code I've had to use (this time with a 2classmodel vs 1classmodel example (where my p came out as 2, my q as 5 and my n as 923):

Code:

*Drop any previously saved scalars: scalar drop _all estimates restore 1classmodel scalar n = e(N) scalar ll_1 = e(ll) scalar parms_1 = e(rank) scalar classes_1 = 1 estimates restore 2classmodel scalar ll_2 = e(ll) scalar parms_2 = e(rank) scalar classes_2 = 2 scalar p = (3 * classes_2 - 1) scalar q = (3 * classes_1 - 1) scalar lr_test_stat = -2 * (ll_1 - ll_2) *show scalars calculated so far: scalar list *scalar modlr_test_stat = lr_test_stat / (1 + ((p - q) * ln(n)) ^ -1) *Substitute values of p, q and n: scalar modlr_test_stat = lr_test_stat / (1 + ((2 - 5) * ln(923)) ^ -1) scalar lmrt_p = chi2tail(parms_2 - parms_1, modlr_test_stat) *Show all scalars: scalar list

you mean that if I need to save those code as ado.file?
I can't get any value out of it.
and also, If It's ado.file code and I chose to compare class4 vs class3, does it change numbers automatically?
or should I put 3 and 4 on those code?
I'm lost in all the codes..........help me please.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment