Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • maximum likelihood over many starting values

    Dear Listers,

    I have run a latent class analysis using gsem. The model converges with no problem to 4-class solution. I have been asked to check that this is indeed a global maximum rather than a local one by running 1000 replications using different starting values.

    Is there a simple way to do that with Stata 17?

    I thank you for your time and help

    Kind regards

    Giovanni

  • #2
    Apologies for the late reply, I'm not always on Statalist but I sometimes check back.

    I think the process is fairly simple in a sense, but it does involve writing a loop and using the putexcel command. Some details at this post.

    Basically, the default LCA command, if you use the startvalues(randomid) option, will run 20 draws of the expectation maximization (EM) algorithm. It will pick the highest log likelihood out of those 20 runs, then maximize those without saving the results of any other run. The code I wrote overrides that. It still tells Stata to use random start values, but only do 1 EM draw, but then repeat that 100x (or however many times - I don't know that 1k is necessary).

    After that, you need to go back to the directory where the estimates were saved, and then compare a few of the estimates in your apparent global LL solution. They will differ in a few decimal places. Additionally, the latent classes won't be identified in the same order (e.g. class 5 in one solution may be class 1 in another). This is tedious. There's no way around it. Please do read the posts carefully. Do also note that you're using the nonrtolerance option to save yourself some time, but you do want to verify the global solution without that option.
    Last edited by Weiwen Ng; 24 Feb 2022, 09:12.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      An addendum to what I wrote above. I don't mean to keep bumping the post, but we can't edit posts after 5 minutes, unlike other sites.

      If you aren't willing to use the method I outlined to loop through and save results to disk, you could tell Stata to do something like

      Code:
      gsem (v1 v2 v3<-, ologit) (v4 v5 v6<-, logit) (v7<-,), [pweight=weight] lclass(6) startvalues(randompr, draws(1000))
      Again, that tells Stata to fit the specified model (syntax copied from Josephine, the poster on the other thread). It will a) run the EM algorithm for 20 steps (default is 20, this can be changed if desired) * 1000, b) find the run with the highest log-likelihood, and c) take that run, then maximize it to full convergence with the usual maximizer. In each execution, the randompr option assigns each observation to a random latent class. The start values the algorithm uses are determined by those random assignments.

      The downside of this option is that there's no paper trail. You can't report how many runs converged at a global maximum, because Stata doesn't save that information.

      I don't think that all the LCA papers I've reviewed report how many % of runs converged at the global maximum (but I haven't reviewed an extensive number). I believe that reporting this, or at least knowing it, is best practice. I am not sure how many reviewers will insist on people doing this.

      I also don't know how many sets of start values are considered acceptable, and what's the minimum % of runs converging at the global maximum that's acceptable. There may be no widely agreed on set of criteria here. I don't see the issue with a minority of runs converging at a global maximum, as long as there's a clear mode. For example, say I do 100 runs, and I report that 33 of them converged at a global maximum. That's a clear mode. I don't see why I need to do 1000 runs in this case. Say I do 100 runs and 2 of them converge at a global maximum. Now I'm a lot more worried, and maybe I'd want to extend the iteration process to 1000 runs. The thing is, say I do this and I end up with 20 runs at the global maximum - do I still trust the solution? I am not sure. From where I sit right now, I'd lean towards no, but I haven't had this happen in real life. I'd maybe report this solution and the previous one (i.e. the one with 1 fewer latent class), and let readers make up their minds.

      If anyone has a more informed opinion than mine, I'd love to hear it.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Dear Weiwen, non need for apologies. Thank you a million for your very informative answer. Once again, an incredibly clear answer.

        I think I have been asked to do the 1000 runs but maybe I can do fewer runs if I can argue the choice properly. Now I get a better sense of the procedure. So I start with a smaller number of runs and then I decide about what to do next based on the degree of worries I should be getting from these initial try outs. It was worth waiting.

        Thank you once more

        Gio

        Comment

        Working...
        X