Guidance on Latent Class Analysis - a few starting questions

Matt Price

Join Date: Aug 2023
Posts: 43

Guidance on Latent Class Analysis - a few starting questions

07 Oct 2024, 18:15

I have a 9-question "Vaccine Hesitancy Scale" developed by SAGE to evaluate vaccine confidence. Each question is answered with a 5-level Likert style response (Strongly Agree, Agree, Neither agree nor disagree, Disagree, Strongly disagree). In addition to the "full" data, I've also collapsed responses into an "abbreviated" set (Agree, Neither agree nor disagree, Disagree). The questions can be broadly categorized into questions of confidence (do you believe that vaccines work and are helpful) and questions of concern over risk (do you believe that vaccines are safe and that new vaccines carry low risks). Responses are coded 1-5 (full) or 1-3 (abbreviated), with lower values corresponding to greater confidence / acceptance of vaccines. I have 835 records in my data set.

We hypothesized we might see about 3 latent classes in our data (vaccine enthusiasts, with low values across the board; vaccine skeptics with higher values for all responses; vaccine ambivalents with low "Confidence" responses but higher "risk" responses - the folks in the middle). It looks like we may have four classes, and that the abbreviated responses might provide the best model to work with (also easier to interpret and describe). Here's AIC and BIC for the 7 models I compared

Model	N	ll(null)	ll(model)	df	AIC	BIC

twoclass_full	835	.	-10533.3	31	21,128.63	21,275.18
threeclass_full	835	.	-8626.52	42	17,337.04	17,535.59
fourclass_full	835	.	-8024.92	53	16,155.84	16,406.39
fiveclass_full	835	.	-7359.83	64	14,847.67	15,150.22
threeclass_abbreviated	835	.	-5011.05	38	10,098.10	10,277.74
fourclass_abbreviated	835	.	-2140.76	48	4,377.52	4,604.43
fiveclass_abbreviated	835	.	-2147.69	58	4,411.38	4,685.57

While running these models, I came across a few things that I couldn't quite wrap my head around. Could you help with the following questions?

1. I've heard that there is a risk of the model iterations solving for a local likelihood maximum, and missing the global maximum of the likelihood function, but I believe this may be a bigger risk when you specify more groups (i.e., more than five?). Is this something I should be concerned over? How can I ask Stata to resolve the model to better approximate the global maximum?
2. The estat lcmean command took an hour to run (!) Is this normal? I'm finding these postestimation statistics are taking a long time to run; I realize they must be computationally intense, but I have a new-ish computer, and an hour to run a command seems excessive. Could I be doing something wrong, or specifying something incorrectly that I should be thinking about?
3. Would you consider anything else beyond the AIC and BIC in comparing the 7 models above? What else would you look at to select the "best" model?
4. Finally, I am concerned about how to classify the function family. I initially felt ologit made sense for the Likert scale data, but the model wouldn't work ("initial values not feasible"). Then I left out any specification, and Stata just picked Family: Gaussian, Link: Identity. I have heard that Likert can be analyzed as continuous data; does this seem reasonable?

I appreciate any advice or any further videos/presentations/reading you might recommend. Thank you.

Last edited by Matt Price; 07 Oct 2024, 18:22.

Tags: estat, gsem, latent class analysis

Erik Ruzek

Join Date: Oct 2017

Posts: 395
#2

08 Oct 2024, 14:20

Hi Matt,

I do not have definitive answers for all of your questions, but hopefully I can point you to some further resources. Before responding to your questions, I have a couple questions about the vaccine hesitancy scale.

What, if any, psychometric work has been done on this scale to establish its reliability and validity? Do those analyses point to there being one or two factors? Have you looked at this yourself using exploratory factor analysis?

I generally do not advise collapsing categories of Likert scale items, however it can sometimes be necessary. An example when it is justified is when doing invariance testing of a measure across groups (e.g., biological sex, race or ethnicity). There might be some cases where members of a group never use a particular response category. To address this, one would collapse responses for that item.

To your questions:

1. Generally speaking, if there are estimation problems, they will show up as either non-convergence of the model or missing standard errors, which might also be combined with implausible estimates. If you do not see any of these, you are likely ok.

2. I have experienced this as have others. The linked thread has helpful suggestions for some problematic edge cases.

3. Yes, there are many other fit statistics of interest, including entropy (see also Trent's excellent presentation), bootstrap likelihood ratio tests, and there are others (see the Weller et al. paper linked below). It maybe worth your time to investigate the Stata LCA plugin from the folks at Penn State who specialize in this type of analysis. The plugin can produce some of these other fit stats.

a) As both a user of LCA and reviewer of LCA research, I am often interested in the size of the latent classes. For example, if you have a latent class in which < 5% of the sample is assigned, that is potentially suspicious. Often these very small groups have really interesting response profiles, but the concern is whether you are you just fitting to noise.

4. With just 3 response options, if you choose to go with the collapsed scale, I see no justification for treating the responses as continuous. You could probably make the case for 5, but you are likely to get pushback from reviewers (justified, IMO). You need to do some diagnosis to understand the model fitting problems with ologit. You should try adding at least 20 random starts and increasing the number of iterations. See here. Although he isn't on Statalist as much anymore, you would do yourself a favor by searching posts on LCA by Weiwen Ng.

Also not Stata-specific, but this is a very good primer on LCA by Weller et al. (2020).

Last edited by Erik Ruzek; 08 Oct 2024, 15:16. Reason: Fixed typo
1 like
Comment
Matt Price

Join Date: Aug 2023

Posts: 43
#3

09 Oct 2024, 16:03

Thank you Erik. It's reassuring to see you cite Weller, as I found and read that yesterday. I also found Sinha et al 2021 to be a good overview of LCA (in case others find it helpful: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7746621/) though from a more clinical perspective.

The VHS we employed has been used and published a lot in multiple contexts (e.g., across US, Africa, Europe, Asia), and validated in Shapiro et al 2018 (https://pubmed.ncbi.nlm.nih.gov/29289384/). Reassuringly, our exploratory factor analysis aligned quite nicely with their work.

The AIC and BIC was (considerably!) smaller in our four-level, collapsed model. I've seen the results of this VHS presented in the three-level Likert scale (I believe the original VHS was a five-level scale, and as you know our data were collected as a 5-level scale), but what do you make of the rather large drop when we move from our four-class full model (e.g., AIC 16,155.84) to a four-class collapsed model (e.g., AIC: 4,377.52). I don't know how to approach or think about this big drop in IC balanced with the loss of information moving from the full, 5-level scale to the collapsed scale. I've used Trenton's entropy plug in (thank you), to get Entropy = 0.962 for the 4-class collapsed model, which also seems satisfactory. Entropy for the 4-class full model was 0.979 (and Entropy for 3-class full model was 0.997). For ease of interpretation and presentation of results, I'm leaning towards the collapsed model; but I could use guidance here. Any additional thoughts, or suggested reading (hopefully not too techical, Waller et al. was great, but heavy maths and I'm lost)?

As an additional question: most Entropy values for the models I tested were >0.93. At what point am I splitting hairs comparing an Entropy of 0.962 to 0.979 when looking at the two models I consider my best options? Particularly given the four-fold difference in AIC and BIC between the two models?

3a: Luckily, my classes all seem reasonbly well-sized and make sense. The smallest is a group is 6%, a class of "vaccine skeptics" who report high distrust of vaccines across most questions. We were expecting this small group, based on reviewing the data prior to the LCA.

4. I have a feeling I'm going to get stuck here. Weiwen's posts are clear and helpful, I may come back with more questions... But thanks again.
Comment

Announcement

Guidance on Latent Class Analysis - a few starting questions

Comment

Comment