Hi team,
I am looking for help in assessing the usability of my LCA model, fit using gsem. I have 14,000 observations on around 25 variables which measure presence of symptoms; the majority of these variables are binary or ordinal. An example of some of these variables is shown at the bottom of my post.
With some work, I have obtained models that converge with up to five classes, without having to impose constraints on any variables. I have no pre-existing hypotheses on the number of expected classes.
The model statistics recommended in Masyn's chapter suggest the four-class or five-class models may provide the best fit and should be chosen. However, when I try to check whether the solutions identified by these models represent global maximums in the likelihood function, I run into problems. I am testing for a global maximum by running the model 100 times with random draws, using the following code based on posts by Weiwen Ng:
However, none or very few of the models converge, and I frequently end up with the following error, which breaks the loop:
Unsurprisingly, this problem tends to occur when obtaining the original convergence was a bit more of a struggle—eg, I obtained it by saving the parameters of a simpler models (ie, matrix b4 = e(b)) and using those as starting values for progressively more complicated models (ie, from(b4)). The impression I get is that, in doing so, I have pinpointed a very rare point in the likelihood function where convergence is possible, and the 100 random draws are therefore unlikely to find this point. However, including more random draws isn't really an option, as these discontinuous regions break the loop and it is taking me forever to even get 100 runs completed.
My question is, does this mean that, despite having the best fit statistics, the model is too weakly identified to be usable, and I should just stick with simpler models that successfully converge on multiple random draws? Or is there any other way of establishing whether these models are usable?
Many thanks to everyone who has posted on LCA in the past (with particular thanks to Weiwen Ng). I think I have read every LCA post on this forum!
I am looking for help in assessing the usability of my LCA model, fit using gsem. I have 14,000 observations on around 25 variables which measure presence of symptoms; the majority of these variables are binary or ordinal. An example of some of these variables is shown at the bottom of my post.
With some work, I have obtained models that converge with up to five classes, without having to impose constraints on any variables. I have no pre-existing hypotheses on the number of expected classes.
The model statistics recommended in Masyn's chapter suggest the four-class or five-class models may provide the best fit and should be chosen. However, when I try to check whether the solutions identified by these models represent global maximums in the likelihood function, I run into problems. I am testing for a global maximum by running the model 100 times with random draws, using the following code based on posts by Weiwen Ng:
Code:
*Iteration log: four class putexcel set iteration_log.xlsx, sheet(fourclass) modify putexcel A1 = "Iteration" B1 = "Log Likelihood" C1 = "Converged" set seed 123321 forvalues i = 1/100 { local j = `i' + 1 gsem (lc_cough_m_b lc_sleep_m_b lc_memory_m_b lc_concen_m_b /// lc_musc_pain_m_b lc_tastesmell_m_b lc_diarrhoea_m_b /// lc_stomach_m_b lc_voice_m_b lc_hair_m_b lc_heart_m_b /// lc_dizzy_m_b lc_sweat_m_b /// EQ5D_self_m_revised EQ5D_mob_m_revised<-, logit) ///Binary (mrc_m_revised PHQ4_final_m EQ5D_act_m EQ5D_pain_m <-, ologit) ///Ordinal (facit_score_m EQ5D_score_m <- ) ///Gaussian , lclass(C4_ 4) ///Fitting four classes startvalues(randompr, draws(1)) iterate(100) estimates save class4, append putexcel A`j' = `i' putexcel B`j' = `e(ll)' putexcel C`j' = `e(converged)' } putexcel close
Code:
cannot compute an improvement -- discontinuous region encountered r(430);
My question is, does this mean that, despite having the best fit statistics, the model is too weakly identified to be usable, and I should just stick with simpler models that successfully converge on multiple random draws? Or is there any other way of establishing whether these models are usable?
Many thanks to everyone who has posted on LCA in the past (with particular thanks to Weiwen Ng). I think I have read every LCA post on this forum!
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(lc_cough_m_b lc_sleep_m_b lc_memory_m_b lc_concen_m_b lc_musc_pain_m_b lc_tastesmell_m_b lc_diarrhoea_m_b EQ5D_self_m_revised EQ5D_mob_m_revised) byte mrc_m_revised float(PHQ4_final_m EQ5D_act_m facit_score_m EQ5D_score_m) 0 0 0 1 0 0 0 0 0 1 1 1 41 71 0 1 0 0 0 0 0 0 0 1 3 2 31 50 0 1 1 1 1 1 0 0 0 2 0 1 29 47 0 0 0 0 0 0 0 0 0 2 0 1 50 69 1 0 1 0 1 0 0 0 0 1 0 1 49 71 0 1 1 0 1 0 0 0 0 1 2 1 45 85 0 0 0 0 0 0 0 0 0 1 0 1 44 92 0 1 0 1 0 0 1 0 0 2 0 1 43 86 1 0 1 1 1 0 0 1 1 3 2 2 22 50 0 1 0 0 0 0 0 0 0 1 0 1 48 69 0 0 0 0 0 0 0 0 0 1 0 1 52 91 0 1 1 1 0 0 0 0 0 1 1 1 50 74 1 1 1 1 1 1 1 0 0 2 1 2 13 50 1 1 0 0 1 0 0 0 0 2 0 1 49 74 0 0 0 0 0 0 0 0 0 1 0 1 51 90 0 1 1 1 1 0 0 1 1 3 1 2 36 38 0 0 0 0 0 0 0 0 0 1 0 1 50 86 0 0 0 0 0 0 0 0 0 1 0 1 45 81 0 0 0 0 0 0 0 0 0 1 0 1 52 98 1 0 0 0 1 0 0 0 0 1 0 1 45 90 1 0 0 0 1 0 0 0 0 1 0 2 41 60 0 1 1 1 1 0 1 0 0 1 0 1 42 50 0 0 0 0 1 0 0 0 0 1 0 1 48 80 0 0 0 0 0 0 0 0 0 2 0 1 47 69 0 0 1 1 1 0 0 0 0 2 1 2 38 70 0 1 0 1 1 0 1 1 1 2 1 2 16 50 0 1 1 1 1 0 1 0 0 2 1 1 21 50 0 0 0 0 0 1 0 0 0 1 0 1 49 85 0 1 0 1 0 0 0 0 0 1 1 1 36 87 0 1 1 1 1 0 0 0 0 1 0 1 45 50 0 1 0 0 1 0 0 0 0 1 0 1 43 89 0 1 1 0 0 0 0 0 0 1 0 1 49 73 0 1 0 0 0 0 0 0 0 2 0 1 49 87 0 1 0 1 1 0 0 0 0 2 0 1 45 70 0 1 0 0 0 0 0 0 0 1 1 1 44 81 0 0 0 0 0 0 0 0 0 1 0 1 52 97 1 1 1 0 1 1 1 0 0 2 1 1 35 83 0 0 0 0 1 0 0 0 1 2 0 1 49 71 0 1 0 0 0 0 0 0 0 1 0 1 50 96 0 0 1 1 1 1 1 0 1 3 0 2 14 50 1 0 0 0 1 0 0 0 1 3 0 1 44 50 0 1 0 0 1 0 0 0 0 1 0 1 49 80 0 0 0 0 1 0 0 1 0 1 0 2 46 80 0 0 0 0 0 0 0 0 0 2 0 1 48 92 0 1 1 0 1 0 0 0 0 1 0 1 50 84 0 0 0 0 0 0 0 0 0 2 3 1 49 38 0 0 0 0 1 0 0 0 0 1 0 1 52 95 0 0 0 0 0 0 0 0 0 1 0 1 48 81 0 0 0 0 0 0 0 0 0 2 0 1 50 80 0 0 0 0 1 0 1 0 0 1 2 2 38 50 0 0 0 0 0 0 0 0 0 1 0 1 46 50 0 0 1 1 1 1 1 0 0 2 1 1 44 79 0 0 1 1 0 0 0 0 0 1 0 1 45 50 0 0 0 0 1 0 0 0 0 1 0 1 51 89 1 1 1 1 1 0 1 1 1 2 0 2 10 29 0 0 0 0 1 0 0 0 0 2 0 1 42 80 1 1 1 1 0 0 0 0 1 2 1 1 21 39 0 1 0 0 1 0 0 0 0 1 0 1 51 54 1 1 0 0 0 0 0 0 0 1 1 1 46 77 0 0 0 0 0 0 0 0 0 1 0 1 52 85 0 0 0 0 0 0 0 0 0 1 0 1 52 98 1 1 1 1 0 1 1 0 1 2 1 1 43 87 0 1 0 0 0 0 0 0 0 1 1 1 52 99 0 0 0 0 0 0 0 0 0 1 0 1 42 84 0 0 0 0 0 0 0 0 0 1 0 1 48 95 0 1 0 0 0 0 0 0 1 3 0 1 49 80 0 0 0 0 0 0 0 0 0 1 0 1 47 79 1 1 0 0 1 0 0 0 1 3 0 2 38 29 0 1 0 0 0 0 0 0 0 1 0 1 43 81 0 0 0 0 1 0 0 0 0 1 0 1 44 96 1 1 1 1 1 0 0 0 0 2 1 . 27 50 0 0 0 0 0 0 0 0 0 1 1 1 45 73 0 0 0 0 0 0 0 0 0 1 0 1 51 95 0 0 0 0 0 0 0 0 0 1 0 1 48 60 0 0 0 0 0 0 1 0 0 1 0 1 50 76 0 1 0 0 0 0 0 0 0 1 0 1 47 86 0 0 0 0 0 0 0 0 0 1 0 1 48 53 1 1 0 0 1 0 0 0 0 1 1 1 40 50 0 1 0 1 1 0 1 0 0 3 1 1 41 47 0 0 0 0 0 0 0 0 0 1 0 1 50 93 0 0 0 0 0 0 0 0 0 1 0 1 48 90 0 0 0 0 0 0 0 0 0 1 0 1 41 65 0 1 0 0 0 0 0 0 0 1 1 1 46 80 0 1 0 1 0 0 0 0 0 1 1 1 44 50 0 0 0 0 0 0 0 0 0 1 0 1 42 87 1 1 1 1 1 0 1 1 1 4 3 2 5 10 1 1 0 1 1 0 0 0 0 2 1 1 28 64 0 1 0 0 0 0 0 0 0 2 1 1 51 91 0 1 0 0 1 0 0 0 0 1 0 1 51 100 0 0 0 0 0 0 0 0 0 1 0 1 52 95 0 1 1 1 1 1 0 0 0 2 2 2 27 50 1 1 1 1 1 0 0 0 1 2 0 2 32 55 1 1 1 1 1 1 0 0 0 2 0 1 6 50 0 1 0 0 1 0 0 0 0 1 0 1 49 93 0 0 0 0 0 0 0 0 0 1 0 1 52 93 0 1 0 0 0 0 0 0 0 1 0 1 51 94 0 0 0 0 0 0 0 0 0 1 0 1 51 96 0 0 0 0 0 0 0 0 0 1 0 1 52 79 1 1 1 0 1 0 0 0 1 2 0 2 47 79 0 0 0 0 0 0 0 0 0 1 0 1 51 91 end label values lc_cough_m_b ny label values lc_sleep_m_b ny label values lc_memory_m_b ny label values lc_concen_m_b ny label values lc_musc_pain_m_b ny label values lc_tastesmell_m_b ny label values lc_diarrhoea_m_b ny label def ny 0 "No", modify label def ny 1 "Yes", modify label values EQ5D_mob_m_revised care_revised label values EQ5D_self_m_revised care_revised label def care_revised 0 "No Problems", modify label def care_revised 1 "Some Problems or Unable", modify label values mrc_m_revised mrc_m_revised label def mrc_m_revised 1 "1", modify label def mrc_m_revised 2 "2", modify label def mrc_m_revised 3 "3", modify label def mrc_m_revised 4 "4 or 5", modify label values PHQ4_final_m phq4_final label def phq4_final 0 "Normal", modify label def phq4_final 1 "Mild", modify label def phq4_final 2 "Moderate", modify label def phq4_final 3 "Severe", modify label values EQ5D_act_m care label def care 1 "No Problems", modify label def care 2 "Some Problems", modify