I'm trying out the new Latent Class Analysis feature of Stata 15's -gsem- command with Stata/IC 15, but I have been unable to get any model to converge except the very simple example in Example 50b in the Stata 15 sem.pdf manual.
The Methodology Center at Penn State provides a Latent Class Analysis Plugin for Stata (version 1.21), which I have been using for a couple of years with Stata 14 without difficulty, but I was hoping the new Stata 15 latent class feature would allow me to avoid using an additional plugin. (The plugin is only available for Windows and can be downloaded at https://methodology.psu.edu/downloads/lcastata .)
In my testing, I'm using a data set included with Penn State's LCA plugin and consisting of LcaSampleDataset.dct and LcaSampleDataset.txt. That data set is a subsample of N = 1,000 cases drawn from a larger sample (N = 13,840). Example results from the larger data set are shown at https://methodology.psu.edu/ra/lca/example , and the results of my testing using the plugin with the subset are consistent with the web site results from the larger sample.
The LCA plugin produces results in 1.9 seconds.
I have also used the poLCA package with R 3.4.0 and reproduced the results with the same data in about the same amount of elapsed time.
My attempts to reproduce these results in Stata/IC 15 (after recoding the indicator variables from 1/2 to 0/1 values) have all failed to converge and consistently have shown the "not concave" issue after about 50 iterations.
I've tried many options with -gsem-, but the following is a typical example of the code I have tried. I've reduced the integration points to 5 (from the default of 7), have added a startvalues() option with 30 draws, have set the integration method to intmethod(ghermite), have added the -difficult- option, and have run 10,000 iterations.
The Stata 15 sem.pdf manual in "intro 12" indicates that intmethod(ghermite) "is less accurate but quicker, and the calculation it makes converges more readily than either of the above methods." I used the intpoint(5) and intmethod(ghermite) options only to get an initial model to converge so I can obtain better starting values.
This command produces an EM log likelihood that does not change (with four significant digits) after the 50th iteration. The model is also consistently reported to be not concave after that iteration, and the -difficult- option does not help. (I've tried the command with and without the -difficult- option.)
I would appreciate any advice about other options to try or option values to tweak with the new Stata 15 -gsem- latent class analysis feature. I was surprised that Stata 15 does not seem to be able to converge to results that can be obtained in about 2 seconds with both the LCA Plugin and the poLCA package with R 3.4.0. I have not been able to try this with MPlus yet, but I've not had problems with Latent Class Analysis with Mplus before, and MPlus is generally very fast.
Thanks for any help and advice.
Red Owl
(yes, my real name)
The Methodology Center at Penn State provides a Latent Class Analysis Plugin for Stata (version 1.21), which I have been using for a couple of years with Stata 14 without difficulty, but I was hoping the new Stata 15 latent class feature would allow me to avoid using an additional plugin. (The plugin is only available for Windows and can be downloaded at https://methodology.psu.edu/downloads/lcastata .)
In my testing, I'm using a data set included with Penn State's LCA plugin and consisting of LcaSampleDataset.dct and LcaSampleDataset.txt. That data set is a subsample of N = 1,000 cases drawn from a larger sample (N = 13,840). Example results from the larger data set are shown at https://methodology.psu.edu/ra/lca/example , and the results of my testing using the plugin with the subset are consistent with the web site results from the larger sample.
The LCA plugin produces results in 1.9 seconds.
I have also used the poLCA package with R 3.4.0 and reproduced the results with the same data in about the same amount of elapsed time.
My attempts to reproduce these results in Stata/IC 15 (after recoding the indicator variables from 1/2 to 0/1 values) have all failed to converge and consistently have shown the "not concave" issue after about 50 iterations.
I've tried many options with -gsem-, but the following is a typical example of the code I have tried. I've reduced the integration points to 5 (from the default of 7), have added a startvalues() option with 30 draws, have set the integration method to intmethod(ghermite), have added the -difficult- option, and have run 10,000 iterations.
The Stata 15 sem.pdf manual in "intro 12" indicates that intmethod(ghermite) "is less accurate but quicker, and the calculation it makes converges more readily than either of the above methods." I used the intpoint(5) and intmethod(ghermite) options only to get an initial model to converge so I can obtain better starting values.
Code:
gsem /// (SmokedBefore13 DailySmoke DroveDrunk DrankBefore13 BingeDrink MarijuanaBefore13 CocaineEver GlueEver MethEver EcstasyEver SexBefore13 ManyPartners <-, nocapslatent), /// logit /// lclass(C 5) /// startvalues(randomid, draws(30) seed(123321)) /// intpoints(5) /// iter(10000)
I would appreciate any advice about other options to try or option values to tweak with the new Stata 15 -gsem- latent class analysis feature. I was surprised that Stata 15 does not seem to be able to converge to results that can be obtained in about 2 seconds with both the LCA Plugin and the poLCA package with R 3.4.0. I have not been able to try this with MPlus yet, but I've not had problems with Latent Class Analysis with Mplus before, and MPlus is generally very fast.
Thanks for any help and advice.
Red Owl
(yes, my real name)
Comment