Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use marginsplot to create a bar graph following gsem analysis

    Hi

    I fit a latent profile model to understand patterns of co-occurence of different types and severities of child abuse. I would like to be able to show the expected probability of each item being represented in the 3 classes in a bar graph as illustrated on Stata's page:

    https://www.stata.com/new-in-stata/l...lass-analysis/

    The webpage states that "We can use margins and marginsplot to visually compare the probabilities of participating in these activities across classes." I cannot see how to get the appropriate margins to then use marginsplot to display the relationship between the classes and the observed variables (as is shown on the webpage). That is, the steps I have followed are:

    gsem( beaten hit smack csa1 csa2 csa3 parviol <-, logit) ( freq duration <- _cons) ,lclass(C3 ) nonrtolerance ///

    margins ///

    [results in the message "Warning: prediction constant over observations."]

    marginsplot

    [results in a line graph with one line, the variables (e.g. beaten, hit, smack) along the x axis and Mu along the y axis]

    Is there a solution - or something I can read that can take me through the steps of illustrating this?

    Many thanks.

    Last edited by Charlene Rapsey; 04 Jun 2018, 20:55.

  • #2
    -marginsplot- has a -recast()- option. Here's a demonstration:

    Code:
    clear*
    sysuse auto
    
    logit foreign i.rep78
    margins rep78
    marginsplot, recast(bar)
    -marginsplot- also accepts nearly all options you can use with any -graph twoway- graph type, so you can further customize the appearance of the graph to your liking.

    Comment


    • #3
      Further reading at the post below.

      https://www.statalist.org/forums/for...=1528213934540
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Brilliant. Thank you Clyde and thank you very much Weiwen for directing me to that post which solved my problem.

        Comment


        • #5
          Charlene Rapsey You are welcome. If I may add, I see that your original command,

          Code:
          gsem( beaten hit smack csa1 csa2 csa3 parviol <-, logit) ( freq duration <- _cons) ,lclass(C3 ) nonrtolerance
          invokes the -nonrtolerance- option. I believe this increases the chance that Stata declares convergence in a non-concave region of the likelihood function, which probably increases the chance of declaring convergence at a local but not a global maxima, i.e. basically, Stata prematurely declares victory. Experts I have read tend to recommend that people fitting LCAs fit multiple attempts from widely varied starting parameters, and choose the solution with the highest consistently replicable log likelihood. Some reading below.

          https://www.statalist.org/forums/for...m-lca-stata-15
          https://www.statalist.org/forums/for...5-gsem-problem
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Weiwen Ng Many thanks for that extra input. I see the problem with
            Code:
            nonrtolerence
            and changed my analysis as a result. Am I correct in assuming that Stata has done the work for me by fitting varied starting parameters and selecting the best solution when I ask this instead?
            Code:
            gsem( ab1 ab2 ab3 ab4 <-, logit) ,lclass(C 3 ) startvalues(randomid, draws(5) seed(15)) emopts(iter(20))

            Comment


            • #7
              Originally posted by Charlene Rapsey View Post
              Weiwen Ng Many thanks for that extra input. I see the problem with
              Code:
              nonrtolerence
              and changed my analysis as a result. Am I correct in assuming that Stata has done the work for me by fitting varied starting parameters and selecting the best solution when I ask this instead?
              Code:
              gsem( ab1 ab2 ab3 ab4 <-, logit) ,lclass(C 3 ) startvalues(randomid, draws(5) seed(15)) emopts(iter(20))
              Charlene, yes. In particular, that code would have assigned each person to each of the classes at random. I believe that Stata would then have started iterating from parameter start values formed from each of the classes as defined by that random assignment. For each random draw you ask for, it will establish the start values, then run 20 EM iterations, then it will draw again. I am pretty sure that if you got Stata to output the resulting start values, you'd see that they would vary. This code might work:

              Code:
              forvalues i = 1/20 {
              gsem(ab1 ab2 ab3 ab4 <-, logit), lclass(C 3) startvalues(randomid, draws(1) seed(15)) emopts(iter(0)) noestimate
              estimates store start`i'
              }
              estimates table start*
              After finishing the 5 draws you asked for, Stata will take the draw with the highest LL, then run its usual maximization process.

              The following statement is only based on my intuition, but I have a feeling that for situations where you have one or two classes that are small but distinct, this process may not vary the starting parameters widely enough to establish them consistently. Once you get to a higher number of classes (e..g 4 or 5), you may wish to explore the -jitter- option in place of the -randomid- option. I'm a bit unclear what scale jitter works on, but for logit parameters, I think that 2 or 3 will vary things quite extensively, e.g.

              Code:
              gsem(ab1 ab2 ab3 ab4 <-, logit), lclass(C 4) startvalues(jitter(3), draws(20) seed(15)) emopts(iterate(10))
              The EM algorithm is a bit slower than the usual maximization algorithm, but I think it is less sensitive to start values. In any case, I have heard advice on this forum that you can reduce the number of EM iterations to 10 or so (default is 20).
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Originally posted by Weiwen Ng View Post
                ...

                Code:
                forvalues i = 1/20 {
                gsem(ab1 ab2 ab3 ab4 <-, logit), lclass(C 3) startvalues(randomid, draws(1) seed(15)) emopts(iter(0)) noestimate
                estimates store start`i'
                }
                estimates table start*
                ...
                That code is wrong. It will produce a long series of identical start values, because it specified the same random number seed for each draw. This code will do what I intended:

                Code:
                set seed 15
                forvalues i = 1/20 {
                gsem(ab1 ab2 ab3 ab4 <-, logit), lclass(C 3) startvalues(randomid) emopts(iter(0)) noestimate
                estimates store start`i'
                }
                estimates table start*
                Or,

                Code:
                set seed 15
                forvalues i = 1/20 {
                gsem(ab1 ab2 ab3 ab4 <-, logit), lclass(C 3) startvalues(jitter 3) emopts(iter(0)) noestimate
                estimates store start`i'
                }
                estimates table start*
                Or whatever start value option you like.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X