I have a 9-question "Vaccine Hesitancy Scale" developed by SAGE to evaluate vaccine confidence. Each question is answered with a 5-level Likert style response (Strongly Agree, Agree, Neither agree nor disagree, Disagree, Strongly disagree). In addition to the "full" data, I've also collapsed responses into an "abbreviated" set (Agree, Neither agree nor disagree, Disagree). The questions can be broadly categorized into questions of confidence (do you believe that vaccines work and are helpful) and questions of concern over risk (do you believe that vaccines are safe and that new vaccines carry low risks). Responses are coded 1-5 (full) or 1-3 (abbreviated), with lower values corresponding to greater confidence / acceptance of vaccines. I have 835 records in my data set.
We hypothesized we might see about 3 latent classes in our data (vaccine enthusiasts, with low values across the board; vaccine skeptics with higher values for all responses; vaccine ambivalents with low "Confidence" responses but higher "risk" responses - the folks in the middle). It looks like we may have four classes, and that the abbreviated responses might provide the best model to work with (also easier to interpret and describe). Here's AIC and BIC for the 7 models I compared
While running these models, I came across a few things that I couldn't quite wrap my head around. Could you help with the following questions?
1. I've heard that there is a risk of the model iterations solving for a local likelihood maximum, and missing the global maximum of the likelihood function, but I believe this may be a bigger risk when you specify more groups (i.e., more than five?). Is this something I should be concerned over? How can I ask Stata to resolve the model to better approximate the global maximum?
2. The estat lcmean command took an hour to run (!) Is this normal? I'm finding these postestimation statistics are taking a long time to run; I realize they must be computationally intense, but I have a new-ish computer, and an hour to run a command seems excessive. Could I be doing something wrong, or specifying something incorrectly that I should be thinking about?
3. Would you consider anything else beyond the AIC and BIC in comparing the 7 models above? What else would you look at to select the "best" model?
4. Finally, I am concerned about how to classify the function family. I initially felt ologit made sense for the Likert scale data, but the model wouldn't work ("initial values not feasible"). Then I left out any specification, and Stata just picked Family: Gaussian, Link: Identity. I have heard that Likert can be analyzed as continuous data; does this seem reasonable?
I appreciate any advice or any further videos/presentations/reading you might recommend. Thank you.
We hypothesized we might see about 3 latent classes in our data (vaccine enthusiasts, with low values across the board; vaccine skeptics with higher values for all responses; vaccine ambivalents with low "Confidence" responses but higher "risk" responses - the folks in the middle). It looks like we may have four classes, and that the abbreviated responses might provide the best model to work with (also easier to interpret and describe). Here's AIC and BIC for the 7 models I compared
Model | N | ll(null) | ll(model) | df | AIC | BIC |
twoclass_full | 835 | . | -10533.3 | 31 | 21,128.63 | 21,275.18 |
threeclass_full | 835 | . | -8626.52 | 42 | 17,337.04 | 17,535.59 |
fourclass_full | 835 | . | -8024.92 | 53 | 16,155.84 | 16,406.39 |
fiveclass_full | 835 | . | -7359.83 | 64 | 14,847.67 | 15,150.22 |
threeclass_abbreviated | 835 | . | -5011.05 | 38 | 10,098.10 | 10,277.74 |
fourclass_abbreviated | 835 | . | -2140.76 | 48 | 4,377.52 | 4,604.43 |
fiveclass_abbreviated | 835 | . | -2147.69 | 58 | 4,411.38 | 4,685.57 |
1. I've heard that there is a risk of the model iterations solving for a local likelihood maximum, and missing the global maximum of the likelihood function, but I believe this may be a bigger risk when you specify more groups (i.e., more than five?). Is this something I should be concerned over? How can I ask Stata to resolve the model to better approximate the global maximum?
2. The estat lcmean command took an hour to run (!) Is this normal? I'm finding these postestimation statistics are taking a long time to run; I realize they must be computationally intense, but I have a new-ish computer, and an hour to run a command seems excessive. Could I be doing something wrong, or specifying something incorrectly that I should be thinking about?
3. Would you consider anything else beyond the AIC and BIC in comparing the 7 models above? What else would you look at to select the "best" model?
4. Finally, I am concerned about how to classify the function family. I initially felt ologit made sense for the Likert scale data, but the model wouldn't work ("initial values not feasible"). Then I left out any specification, and Stata just picked Family: Gaussian, Link: Identity. I have heard that Likert can be analyzed as continuous data; does this seem reasonable?
I appreciate any advice or any further videos/presentations/reading you might recommend. Thank you.
Comment