Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing classes in a Latent Profile Model

    Dear all,

    I am trying to determine the number of groups or classes for a latent variable using seven observed variables. I am using Stata 15.1.

    I followed example 52g of the Stata manual (Latent profile model), on the section "comparing models", which tells us to look for the smallest AIC and BIC values.

    My results produced two low(er) BIC values, in the five class model, and then in the seven class model. This is confusing as I was expecting to obtain a single low value and then go with it. My understanding is that the best solution is five classes. Another source of confusion is that a previous model I estimated using Hierarchical clustering signals the solution is three classes, which is consistent with the literature.

    Maybe some of you have insights on this topic? Many thanks.

    My code looks as follows:

    Code:
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 2)
    estimates store twoclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 3)
    estimates store threeclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 4)
    estimates store fourclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 5)
    estimates store fiveclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 6)
    estimates store sixclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 7)
    estimates store sevenclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 8)
    estimates store eightclass
    gsem (norms trust farming lfunction informal engagement advisory <- ), lclass(C 9)
    estimates store nineclass
    estimates stats twoclass threeclass fourclass fiveclass sixclass sevenclass eightclass nineclass
    My data look like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(norms trust farming lfunction informal engagement advisory)
           .        .        .        .        .        .        .
     -.98162   -.3114  -.42686   .27697  -.58359   -.9699  -.01755
    -1.27795   .07518  -.92345  -.60814  -.41226 -1.04472  -.03326
     -.17439   .06842  -.21618  1.90339   .05599  -.46079   .11537
      .00181   .34967  1.73676  1.62089  1.70806   .55895   .95268
    -1.07155  -.67026   .18111   .09381  -.60163  -.01348  1.24672
     -.65683  -.01478  -.42252   .14127   .32071  -.36836  -.41697
     -.62188  1.07709    .2193  1.59195  -.77941    .1342  -.82457
           .        .        .        .        .        .        .
     -.19673  -.27395  2.01558  1.58268  1.15111  2.04795   .87557
           .        .        .        .        .        .        .
           .        .        .        .        .        .        .
     -.23446  -.31216   .38219  -.76928   .67551   .34163   .66063
      .66597   .02493    .7812  -.55981  -.53341   .14777  -.02744
      .79905   .67628   1.2509   .20092  1.06249    .7914  -.32487
      .57796  1.00764  -.01036  -.49744   .72682   .09987  -.28355
     -2.1567 -2.00231  -.75437  -.58971 -1.35053  -.96453    -.322
      .28847   .48866  1.78863  1.77702   .28262  -.49722  -.70806
       2.917   2.1843  1.47226   .17037  1.46298  2.07718  -.79311
     -.42906   .01127  -.33037  -.76792   -.3787  1.01925   .38168
    -1.28062  -.90478  -.84523  -.70117  -.86212   -.9561  -.07887
     -.15234  -.11308  -.42957  -.55775 -1.05749   .16358  -.07475
      .01219   .27228  -.82642  -.89828  -.62624   .08204   .52512
     -.24257  -.09245  -.19699   .06336  -.42748   -.1867  -.03323
           .        .        .        .        .        .        .
     -.25378   .19178  -.55938  -.83342  -.29252    .0079  1.39661
     -.34622  -.23197  -.50193   .01962  -.77826  -.07785  -.54594
    -1.33258 -2.12957  -.60777    .1453  -.85057  -.37469   .15513
     -.49857  -.32841   .08047  -.85962   .55352   .11355   .83447
           .        .        .        .        .        .        .
      .00208  -.25685   .15314  -.80633   .78764   .58573    -.569
     2.44616  1.93312   .43792   .16945   .05415  -.12629  -.81038
    -1.16391  -.91032 -1.41753   -.1193  -.59082  -.48461    .4458
           .        .        .        .        .        .        .
     -.63856 -1.29752  -.74769   .63711  -.58303   .14735  1.02066
      .80684  1.43026  -.11052  1.79622  -.49766  -.46734   .25756
     -.89285   -.2508  -.75942  -.76664  -.94423  -.70646  -.33857
      .55308   .30491   .00904  -.71084   .95948  1.22393    .2342
           .        .        .        .        .        .        .
           .        .        .        .        .        .        .
      .82427   .15126  1.18409  1.81541   .52152  1.28638  -.50343
    -1.06096  -.24109  -.44379  -.69832   .67544   -.5215   .93901
      .32824  -.22754  -.63507  1.54957   .64834   .78442   .10655
      .11212   .36787   .57904  1.75799  -.92055  -.42177  -.80624
     -.01378  -.18003  -.18454  -.85153  -.18396  -.31322  -.77143
     1.10172  1.12238   .04267   .83061   .18984   .64337  -.48618
       .0956  -.64086  -.32717   .65141   -.5537   .44379  -.63879
     -.03716   .97456  -.93637  1.40202   .79696   -.2963    .1341
     -.11278   .63474   .30942  1.67336  -.72935  -.66114  -.24077
      .32976   .95995    .0936  1.41741   .19751   .41717  -.25109
      .28598   .94149  -.49551  -.70068    .0374  -.95552  -.62067
     -.43843  -.04589   .29333  1.56714    .7755  -.53211   .97443
      .90736   .57099   .12578   .68461   .42543   .55269   .03414
     1.19604  1.12332  -.48926  -.74625   .88298  -.30151   .43248
           .        .        .        .        .        .        .
     -.58343   .48143  -.38741  -.52045  -.01152  -.45304  -.62327
      .57667   .89587  -.80499   -.6647  -.71928  -.12715  -.01692
      .03393  -.24292   .03263   .11358  -.42574   .28757  -.38657
     -.24131  -.09174   -.2946  -.46846  -.82168  -.35623   .15242
           .        .        .        .        .        .        .
       .1217   .69624   .78358  -.82037   1.5453   .41513 -1.07919
      .67261   -.6634  -.66666  -.82749   -.4517    -.182  -.56629
       .0297   .76154  -.69922   .98276   .15046  -.87889  -.37813
           .        .        .        .        .        .        .
     1.14868  -.21988  -.63483 -1.00056    .1877   .75215   .87147
     -.00828   .06452   .10774  -.54093  -.26943  -.15246  -.82105
      .90453  -.36292   .81872   .08099  1.17278   .42526    .0197
    -1.24279   -.8721  -.92597  -.79885   .48879  -.54551   .46449
     -.79878    .0201   .08659    .0727   .55572  1.20979  -.15132
      .57728   .60675   .25095   .18553   .04941   .37787   .28629
     1.16343  1.63819   .81735   .08798   .22632  1.45681   .50109
     -.25402   .32468  -.69245  -.77693  -.35289   .28376   -.3427
     -.45147   .11576  -.32327  -.68546   .36101  -.15307  -.44881
     -.04494  -.06861   .13233  -.90248   .06439   .68085  -.84557
           .        .        .        .        .        .        .
      -.1818   .37248   -.4322   .55371   .64133    .4286  -.27781
       1.214  2.02057  1.22313   .09709  -.30147  -.15322 -1.23281
     -.30937   .13472   .37401  -.67369  1.07601  -.01124   -.6008
      .26423   .02267  -.90632  -.94903  -.59597   .25893   -.1776
           .        .        .        .        .        .        .
     1.07253   .92323   .88192  1.54814  1.86285  1.49033  -.60765
      .23948   .95284   .20922  -.53522  -.98768  -.58779  -.82547
      .21964  -.21275   .37773  1.64471  1.00786   .57222  -.71256
     -.76473  -.32201  -.40709  1.66059  -.61379  -.23045    .0912
      .18023  -.45177  -.81183    -.848   .86402  -.17908  1.68735
    -1.49997 -1.19828 -1.28282  -.81151  -.76796   -.6276   .50403
      .30224  -.10522   .01976  -.64449  -.02935   .41612   .31947
           .        .        .        .        .        .        .
     -.52266 -1.34936  -.73154  -.76804  -.31962  -.05722   .41141
     -.96957  -.78634   .53871   .12362  -.29297  -.09075   .11078
     -.78604  -.11771  -.84394  -.80837  -.77715   -.6895  -.24794
      .30094  -.48776  -.39064  -.83408   .62239  1.70087  -.06108
     -.72275  -.19341   .51569   .30624  -.70441  -.29125   .93915
      .20981  -.05193   .47408  -.85325   .38379   .86116  -.68158
     -.35538   .03694  -.86964  -.83604  -.72419  -.46129   .42457
     -.19418   .13862   .89344  1.42595  1.37781  1.08988    .0639
     -.01774  -.21469  -.08852  -.67479  -.96608   .11353  -.91709
           .        .        .        .        .        .        .
           .        .        .        .        .        .        .
      .58376  1.12999   .99077   .02522   .03026  -.32304 -1.24935
    end
    My model comparison looks as follows:

    Code:
    Akaike's information criterion and Bayesian information criterion
    
    -----------------------------------------------------------------------------
           Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
    -------------+---------------------------------------------------------------
        twoclass |        398         .  -2956.341      22    5956.682   6044.384
      threeclass |        398         .  -2872.062      30    5804.124   5923.718
       fourclass |        398         .  -2843.916      38    5763.832   5915.317
       fiveclass |        398         .  -2811.454      46    5714.908   5898.285
        sixclass |        398         .  -2800.841      54    5709.683   5924.951
      sevenclass |        398         .  -2737.597      62    5599.194   5846.354
      eightclass |        398         .  -2724.188      70    5588.376   5867.427
       nineclass |        398         .  -2712.336      78    5580.671   5891.614
    -----------------------------------------------------------------------------
                   Note: N=Obs used in calculating BIC; see [R] BIC note.

  • #2
    First, hierarchical clustering and latent class analysis may have similar goals, but they are different statistical models. You are not guaranteed exactly the same results. However, look at how the means of the variables are distributed across your final latent class model. Does the LCA model tell you substantively the same thing as the hierarchical clustering model?

    Second, I think the procedure is to add additional classes until you get a model which stops converging. From there, choose the lowest BIC. I don't think there's a guarantee that the BIC will decrease to a global minimum and then only increase - the log likelihood should increase monotonically, but the BIC is penalized for complexity, and you can see that the magnitude of the increase in LL isn't constant, so I guess that's maybe a vague explanation. I wouldn't worry about it. Out of the models you presented, the seven class model seems the best.

    However, you have missed a step!! The Stata example for latent profile example could be more explicit about this, but - and this only applies to models with Gaussian indicators, which you have - you need to explore structures where the (error) variances among the indicators are correlated within latent classes, and structures where the (error) variances of the indicators are allowed to vary across the latent classes. Stata's default is a bit strict. Read SEM example 52's output table. Specifically, take the first model with 2 latent classes. Compare the variances of the error terms for the 3 indicators - they're equal.

    Latent profile analysis is like taking a multidimensional magic cookie cutter to your data. You are basically stamping cookies out of the data, trying to find a solution that covers as many points as possible. With Stata's default setting, you are constraining the cookie cutter to take equal bites.

    Consider this diagram, from the intro for the R package flexmix.

    Click image for larger version

Name:	Screen Shot 2021-01-22 at 6.20.53 AM.jpg
Views:	1
Size:	81.9 KB
ID:	1590731


    You would be constraining the classes to have the same height and width. This is a strict assumption. Honestly, I wonder if it's too strict and if it can be dispensed with.

    What does correlated error terms mean? This is also easy. The left diagram is a model with all error terms independent within each class. The right diagram allows the error terms to be correlated within each class - basically, allowing the indicators to be correlated, not independent, within each class. Consider the diagonal lump of points - I think the raw data may be eruptions from a geyser called Old Faithful and the axes are x and y coordinates, but never mind that. In the model on the left, the model thought OK, this diagonal slice must be composed of two latent classes, 1 and 4. It then groups two widely separated clusters into latent class 3. If you tell the model that actually your indicators can be correlated, you get the diagram on the right. R assigns the diagonal slice of points to one latent class. What was formerly latent class 3 is (probably more correctly) split into two latent classes (now labeled 1 and 4).

    Correlated error terms is activated using the option covstructure(e._OEn, unstructured). Allowing the error terms' variance to be unequal across latent classes is activated with lcinvariant(none).

    If you read Kathryn Masyn's book chapter, cited in the SEM example, on latent class and profile modeling, she advises fitting four sets of models with each combo of the two options. Within each type of model (e.g. error variances allowed to vary + error terms correlated within class; error variances allowed to vary + error terms independent within class; etc), she advises adding classes until you stop converging, then selecting the lowest BIC within structure type, then you take your four models and find the lowest BIC.

    I have to be honest, this is pretty tedious. Because the assumption of error terms' variance equal across classes seems extremely strict, I honestly wonder if people should just dispense with that option entirely. I would also make the case that we can start with the loosest set of assumptions (variance across classes unequal, indicators within class correlated), find the best model, and then inspect the results and consider if you can constrain the cross-class variance to be equal or the indicators to be uncorrelated. E.g. if the error correlations are all substantively 0 across classes, you could maybe justify constraining to independent. In a forthcoming paper, I'm going to lead with this, and if any reviewers push back, I will respond in that fashion.

    The TL;DR of the last few paragraphs is: you fit 9 very latent profile models with assumptions that I would consider very restrictive, maybe even unrealistically restrictive. (If you're into IRT, Stata's default setting seems to me like it's even more restrictive than a Rasch model's assumptions!) Reconsider the type of model you fit. At minimum you need to fit models with a less restrictive structure. You may see something more akin to your hierarchical clustering model. Additionally, hierarchical clustering itself is a well-recognized technique, and in my view it's not required to fit a latent class model (but that doesn't mean you should not, provided you understand what the model is doing.)
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Dear Weiwen,

      Many thanks for your elaborate reply to my query. As instructed, I have run my gsem model using the covariances instructions lcinvariant(none) and covstructure(e._OEn, unstructured).

      As recommended in Masyn's book chapter, I then put together a table to compare BIC values. I obtained the following results:

      Code:
         
      class BIC BIC + lcinvariant BIC + covstructure BIC+ lcinvariant + covstructure
      1 6536.391 6536.391 5726.355 5726.355
      2 6044.384 5982.513 5569.895 5536.405
      3 5923.718 5917.452 5563.118 5604.547
      4 5915.317 5820.818 5532.341 5687.442
      5 5898.285 5853.595 5623.522
      6 5924.951 5872.129 5257.102
      7 5846.354 5853.216 5630.893
      As recommended in gsem example 52g, I also included starting random values: startvalues(randomid, draws(5) seed(15)) emopts(iter(20))

      Code:
       
      class BIC BIC + lcinvariant BIC + covstructure BIC+ lcinvariant + covstructure
      1 6536.391 6536.391 5726.355 5726.355
      2 6044.384 5982.513 5569.895 5648.8
      3 5923.718 5917.452 5563.118 5787.037
      4 5915.317 5827.131 5592.729 5674.535
      5 5898.285 5797.611 5580.528 5840.393
      6 5919.73 5825.67 5665.048 6039.795
      7 5846.354 5864.229 5631.061 6145.741
      8 5862.423 5881.736 5329.167 6284.164
      9 5907.147 5888.416 6496.769
      I am confused about the above results. First, I am unsure whether adding starting values is a good practice or not. Second, it seems that every time I include the covstructure(e._OEn, unstructured) option (see third and fourth columns), my BIC values go out of control. I would like to justify not using that command, but I didn't fully understand your statement on "if the error correlations are all substantively 0 across classes, you could maybe justify constraining to independent".

      My intuition would be to select the 4 class model of the BIC + invariant model (no starting random values) - column 2, first table. It looks like it is a stable option while still relaxing for the constant error variance assumption. However, any other insights would be much appreciated.

      From my own experience, I agree that a two-step clustering procedure (i.e. hierarchical followed by k-means clustering) is a solid technique, but a few articles have already used that method for similar data. As LCA is claimed to be more statistically solid vs the two-step clustering method, I am trying to provide an innovative perspective on the topic.
      Last edited by Jesus Pulido; 23 Jan 2021, 16:22.

      Comment

      Working...
      X