Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • spaghetti plot with group mean lines

    Hi, I need a help to create group mean (estimated) lines on spaghetti plot. I want to create one graph that can show all participants' data in gray color while show group mean in two different color lines.

    Two groups (Group 0, 1), each group has 50 participants
    We measured vigor five times a day for eight days.
    "NEWplanned" is the one that we created to show total 40 (5x8) observations for each participant.

    I created spaghetti plot for all participants using the following code:

    xtline EMA_vigor , t(NEWplanned) i(ID) overlay

    then, I created each group plot:

    xtline EMA_vigor if Group==0 , t(NEWplanned) i(ID) overlay
    xtline EMA_vigor if Group==1 , t(NEWplanned) i(ID) overlay

    Is there a way to create one graph that can show all participants' data in gray color while showing group mean in two different color lines (Group 0 = red color; Group 1 = blue color)?

    I would sincerely appreciate your suggestions and recommendations.

    Sam

  • #2
    Wanting to distinguish group means makes sense. Not wanting to distinguish individuals in groups sounds perverse, or at least lacking in ambition.

    Not having a data example here inhibits experiment.

    xtline is just a convenience command for what it does. It doesn't purport to be a highly general tool. I tried subverting it and gave up quickly. It's easier to use more basic commands.

    This code is all about an example anyone can run and may give you the main technique you need.

    Code:
    clear 
    webuse grunfeld
    gen y = ln(invest)
    rename company id 
    l id y time in 1/30
    gen group = id > 5 
    
    * so we have an identifier, a time variable and a grouping variable 
    
    egen mean = mean(y), by(group time)
    separate mean, by(group) veryshortlabel
    separate y, by(group) veryshortlabel
    
    sort id time
    set scheme s1color
    
    line y0 y1 mean0 mean1 time, c(L L L L)lc(red*0.2 blue*0.2 red blue) lw(medium medium medthick medthick) legend(order(3 "group 0" 4 "group 1" - "(means thicker)") pos(3) col(1)) yla(, ang(h)) ytitle(interesting explanation)

    Click image for larger version

Name:	group_mean.png
Views:	1
Size:	50.1 KB
ID:	1513045

    Comment


    • #3
      Hi, Nick,

      Thank you so much for your help! It worked beautifully!!!

      I sincerely appreciate your kind help!

      Have a wonderful day!

      Best regards,

      Sam

      Comment


      • #4
        I have a similar problem of -ambitiously- wanting to show individual data in the context of all the other individuals of the group. Essentially I would like to create a spaghetti plot (say with transparency%10) with a line plot superimposed in full colour.
        The spagplot package is quite convenient but it isn't a twoway plot, so one can't just superimpose two graphs.
        The alternative is to adapt the code above but I have an imbalanced design, in that "time" values (integers from 2 to 36, in theory in increments of 2 initially, then 6, but in reality with a lot of missingness) are not the same for every participant in every group. In the example, let's say that company 1 is missing years after 1945, company 2 is missing 1935-39 and 1953-4, and so on.
        So when I plot the means, it's not one line, but a different mean line for every participant
        A solution for this would be discretise time so that everyone has the same mean time values, but I lose data.
        Any other ideas?
        Thank you very much in advance

        Comment


        • #5
          Hi all -
          I'm in the same position as Nazzarena (above). I followed the code from Nick and it works great except I get multiple mean lines within each group. I believe this is due to a data structure that Nazzarena described in that I have ~847 participants who each completed some combination of 11 different timepoints. I'm guessing that the separate mean lines are for each combination of timepoints that given participants completed? I'm wondering if anyone could suggest an adaptation that creates one mean line per group despite participants completing different combinations of timepoints. Or if maybe there is a better explanation for the multiple mean lines, my end goal is still one mean line per group.

          Any help is much appreciated.

          Hannah

          Comment


          • #6
            Let's please have a simple data example that illustrates your problem.

            Comment


            • #7
              I simulated a dataset with some gaps and wrote more circumspect code for that set-up. Here the first graph is just to show that the code is behaving reasonably over gaps and the second graph is perhaps closer to what you might want to use.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              set obs 200
              egen t = seq(), to(20)
              egen id = seq(), block(20)
              egen group = seq(), block(100)
              set seed 2803 
              gen y = group + rnormal(0, 0.2) if runiform() < 0.8 
              egen mean = mean(y), by(group t)
              separate mean, by(group) veryshortlabel
              local mv `r(varlist)'
              
              egen gtag = tag(group t)
              
              sort id t 
              twoway connected y t if group == 1, lc(red) ms(Oh)  mc(red) lw(thin) c(L)|| connected y t if group == 2, lc(blue) mc(blue) ms(+) lw(thin) c(L) || line `mv' t if gtag, lc(red blue) lw(thick ..) xla(1/20) legend(order(1 "group 1" 2 "group 2" 3 "mean group 1" 4 "mean group 2")) name(G1, replace)
              
              twoway line y t if group == 1, lc(red) ms(Oh)  mc(red) lw(thin) c(L)|| line y t if group == 2, lc(blue) mc(blue) ms(+) lw(thin) c(L) || line `mv' t if gtag, lc(red blue) lw(thick ..) xla(1/20) legend(order(1 "group 1" 2 "group 2" 3 "mean group 1" 4 "mean group 2")) name(G2, replace)

              Click image for larger version

Name:	groupmeanstoo_G1.png
Views:	1
Size:	92.5 KB
ID:	1666763
              Click image for larger version

Name:	groupmeanstoo_G2.png
Views:	1
Size:	78.2 KB
ID:	1666764

              Comment


              • #8
                Hi and thanks for sharing this information. I'm facing a similar problem and I followed Nick's code which worked very nicely for most of my needs, however I'm not getting the correct means for the groups, I'm only getting one line per group which doesn't change in time as above. Could you please help me?
                I have 80 participants, with between 1 and 3 points in time assessments (SCIM), measured in days, from 5 days to 400 days and I have 2 groups, "exposed and unexposed".

                Here is an example:

                id time SCIM group
                1 34 85.9 2
                1 90 94.9 2
                1 146 95.4 2
                3 33 99.2 2
                3 97 100 2
                8 27 90.4 1
                8 76 100 1
                8 111 100 1
                9 38 62.7 2
                9 84 86.9 2
                9 165 96.7 2
                6 20 75.9 2
                6 98 97.8 1
                6 112 98.3 1


                This is the code I ran:

                clear all

                generate months=.
                replace months= time/30
                replace months=round(months, 0.5)
                set seed 2803
                egen mean = mean(SCIM), by(group)
                separate mean, by(group) veryshortlabel
                local mv `r(varlist)'

                egen gtag = tag(group months)

                sort id_swisci months

                twoway connected SCIM months if group == 1, lc(red) ms(Oh) mc(red) lw(thin) c(L)|| connected SCIM months if group == 2, lc(blue) mc(blue) ms(+) lw(thin) c(L) || line `mv' months if gtag, lc(red blue) lw(thick ..) xla(1/20) legend(order(1 "group 1" 2 "group 2" 3 "mean group 1" 4 "mean group 2")) name(G1, replace)

                And this is what I got:

                See G1


                As you can see, the "mean" lines don't follow the change in time as in Nick's example.
                Thank you very much!
                Attached Files
                Last edited by Vanessa Seijas; 01 Sep 2023, 09:32.

                Comment


                • #9
                  Figure G1.gph
                  Last edited by Vanessa Seijas; 01 Sep 2023, 09:32.

                  Comment


                  • #10
                    Sorry, but I can't follow your code. After clear all you should have no data in memory, meaning no observations and no variables, so in particular it will be impossible to take means of SCIM by group because those variables don't exist.

                    We have no scope to check your claim without a clear, consistent, reproducible example.

                    Comment


                    • #11
                      Hi, sorry, I made a mistake when copying and pasting the code. Here it is again:

                      Data example:

                      id days SCIM group
                      1 34 85.9 2
                      1 90 94.9 2
                      1 146 95.4 2
                      2 33 99.2 2
                      2 97 100 2
                      3 27 90.4 1
                      3 76 100 1
                      3 111 100 1
                      4 38 62.7 2
                      4 84 86.9 2
                      4 165 96.7 2
                      5 20 75.9 1
                      5 98 97.8 1
                      5 112 98.3 1


                      This is the code I ran:

                      generate months=.
                      replace months= days/30
                      replace months=round(months, 0.5)
                      set seed 2803
                      egen mean = mean(SCIM), by(group)
                      separate mean, by(group) veryshortlabel
                      local mv `r(varlist)'

                      egen gtag = tag(group months)

                      sort id months

                      twoway connected SCIM months if group == 1, lc(red) ms(Oh) mc(red) lw(thin) c(L)|| connected SCIM months if group == 2, lc(blue) mc(blue) ms(+) lw(thin) c(L) || line `mv' months if gtag, lc(red blue) lw(thick ..) xla(1/20) legend(order(1 "group 1" 2 "group 2" 3 "mean group 1" 4 "mean group 2")) name(G1, replace)

                      Thank you!

                      Comment


                      • #12
                        Thanks for fixing the data example.

                        Simplified and corrected code runs as below.

                        The corrections are

                        1. To calculate separate means for separate months as well as separate groups. You got what you asked for, means for each group (so pooling all months). Compare the code in #7 in which a time variable as well as a group variable is specified in the by() option when calculating means.

                        2. To change the xlabel() call to match months that don't go as far as 20. For your full example you might need more labels.

                        3. A sort option to plot the means as you wish, to work around the awkward order of your data example.


                        Code:
                        clear 
                        input id days SCIM group
                        1 34 85.9 2
                        1 90 94.9 2
                        1 146 95.4 2
                        2 33 99.2 2
                        2 97 100 2
                        3 27 90.4 1
                        3 76 100 1
                        3 111 100 1
                        4 38 62.7 2
                        4 84 86.9 2
                        4 165 96.7 2
                        5 20 75.9 1
                        5 98 97.8 1
                        5 112 98.3 1
                        end 
                        
                        generate months = round(days/30, 0.5) 
                        egen mean = mean(SCIM), by(group months) 
                        separate mean, by(group) veryshortlabel
                        local mv `r(varlist)'
                        
                        egen gtag = tag(group months)
                        
                        sort id months
                        
                        twoway connected SCIM months if group == 1, lc(red) ms(Oh) mc(red) lw(thin) c(L)|| connected SCIM months if group == 2, lc(blue) mc(blue) ms(+) lw(thin) c(L) || line `mv' months if gtag, lc(red blue) lw(thick ..) xla(1/5) legend(order(1 "group 1" 2 "group 2" 3 "mean group 1" 4 "mean group 2")) sort name(G1, replace)

                        Comment


                        • #13
                          Thank you so much! it worked beautifully. One last question, how could I add the markers for the standard deviation of the mean in both groups at each time point?

                          Comment


                          • #14
                            So you want capped lines from mean - SD to mean + SD -- or something else?

                            Comment


                            • #15
                              Yes, exactly. thank you!

                              Comment

                              Working...
                              X