Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Line graph of mean values by group

    I have a repeated measures data set and I would like to create a line graph of the mean values of test scores at each of seven timepoints for four groups:
    • white males
    • black males
    • white females
    • black females
    which are coded as a pair of dummy variables (female, black) Y-value is testscore, X-value is month (1-7)

    I could manually calculate the mean() of each group:
    Code:
    mean(testscore) if(black==0) & (female==0) & (month==1)
    mean(testscore) if(black==0) & (female==0) & (month==2)
    mean(testscore) if(black==0) & (female==0) & (month==1)
    ...
    ​​​​​​​mean(testscore) if(black==1) & (female==1) & (month==7)
    but there has to be a programmatic way of doing this more elegantly.

    ​​​​​​​
    Last edited by John Bald; 15 Apr 2019, 10:25.

  • #2
    You may wish to see examples here.
    Best regards,

    Marcos

    Comment


    • #3
      Marcos,

      While I appreciate that many people new to this group may not have reviewed the basic resources, I would like to reassure you that I have sought out other options before asking people to donate their time to help.

      The example you provide is a basic table of values, where each line is simply plotted by marking the value of sequential rows of the same column.

      My use case is different:
      studentid testscore month black female
      1 30 1 0 1
      1 37 2 0 1
      1 53 3 0 1
      2 28 1 1 1
      2 33 2 1 1
      2 44 3 1 1
      2 47 5 1 1
      2 51 6 1 1
      3 22 1 0 0
      3 25 3 0 0
      3 29 5 0 0
      3 30 7 0 0
      In pseudo code, I would need to:
      1. calculate the mean test score of white males for each month
      2. calculate the mean test score of black males for each month
      3. calculate the mean test score of white females for each month
      4. calculate the mean test score of black females for each month
      5. plot four lines with seven points each
      The example page you provided doesn't help me do that.

      Comment


      • #4
        Here is some technique:

        Code:
        clear
        input studentid    testscore    month    black    female
        1    30    1    0    1
        1    37    2    0    1
        1    53    3    0    1
        2    28    1    1    1
        2    33    2    1    1
        2    44    3    1    1
        2    47    5    1    1
        2    51    6    1    1
        3    22    1    0    0
        3    25    3    0    0
        3    29    5    0    0
        3    30    7    0    0
        end
        
        egen mean = mean(testscore), by(month black female)
        gen group = cond(black == 0, cond(female == 0, 0, 1), cond(female == 0, 2, 3))  
        label def group 0 "white male" 1 "white female" 2 "black male" 3 "black female"
        label val group group
        separate mean, by(group) veryshortlabel
        twoway connected mean? month , legend(pos(3) col(1)) ytitle(mean test score) yla(, ang(h)) xla(1/7)
        There are slower ways to do it that may seem simpler. Here is one:

        Code:
        clear
        input studentid    testscore    month    black    female
        1    30    1    0    1
        1    37    2    0    1
        1    53    3    0    1
        2    28    1    1    1
        2    33    2    1    1
        2    44    3    1    1
        2    47    5    1    1
        2    51    6    1    1
        3    22    1    0    0
        3    25    3    0    0
        3    29    5    0    0
        3    30    7    0    0
        end
        
        egen mean = mean(testscore), by(month black female)
        gen group = 1 if black == 0 & female == 0
        replace group = 2 if black == 0 & female == 1
        replace group = 3 if black == 1 & female == 0
        replace group = 4 if black == 1 & female == 1
        label def group 0 "white male" 1 "white female" 2 "black male" 3 "black female"
        label val group group
        
        twoway connected mean month if group == 1 ///
        ||     connected mean month if group == 2 ///
        ||     connected mean month if group == 3 ///
        ||     connected mean month if group == 4 ///
        , legend(order(1 "white male" 2 "white female" 3 "black male" 4 "black female") ///
        pos(3) col(1)) ytitle(mean test score) yla(, ang(h)) xla(1/7)
        In addition to help and manual entries, cond() is documented at https://www.stata-journal.com/sjpdf....iclenum=pr0016 and separate is documented at https://journals.sagepub.com/doi/pdf...867X0500500412

        P.S. I am sure that Marcos Almeida intended you to click on the links at that page and to read some documentation. After all the graph does show black males, white males, ... over time, which is essentially your kind of problem.
        Last edited by Nick Cox; 15 Apr 2019, 11:42.

        Comment


        • #5
          Nick,

          Thank you for your help.

          I have attempted to run your code both ways and in both instances, instead of four smooth lines, I get a network of interconnected lines.
          Click image for larger version

Name:	nickcox.png
Views:	1
Size:	352.6 KB
ID:	1493389
          Limiting the graph to just the mean0 group, it seems that even within the group, each month's mean is connected to every other month's mean.
          Click image for larger version

Name:	graph2.png
Views:	1
Size:	213.3 KB
ID:	1493388
          Last edited by John Bald; 15 Apr 2019, 12:35.

          Comment


          • #6
            You just need a sort option as well. Or so I imagine: you didn't give your syntax.

            Comment


            • #7
              Nick,

              There's no other syntax beyond what you provided. I cleaned my data set in other software and imported into Stata, and as an exercise, tried to create the graph. The data table is sorted by an id variable and then the month variable, exactly as described in the table shown above.

              Comment


              • #8
                My code works for your example data but your full dataset is likely to be messier. So, add a sort option to the other options.

                Comment


                • #9
                  As Nick underlined in # 4, the link I shared has several examples of similar graphs which may apply to this case.
                  Best regards,

                  Marcos

                  Comment


                  • #10
                    I hope you found your answer but I have another solution that is less elegant, but more intuitive (if you ask me). I run into these same issues.
                    1. preserve your data, 2. then use the table command or collapse command to generate a new dataset of means by the grouping and time interval, 3. reshape the dataset so that each category is it's own variable, then 4., graph them all together as separate variables. 5. When you're satisfied with the graph and have saved it, RESTORE the original dataset.

                    For simplicity's sake, I'm going to show you with just the sex variable but it can be expanded with other variables.
                    E.g.,


                    preserve
                    table month sexflag, c(mean testscore) replace /// note: the replace option creates the new dataset the same way that the collapse command would
                    reshape wide table1 , i(month) j(sexflag) /// this creates new variables for each category by month, there should be the variables month table10 table11
                    twoway line table10 table11 month /// add all the other options you'd like to the graph and then save it
                    restore

                    I'm putting this out there because even if you don't use this any longer, this is the first result in google results when I ask this question. So hopefully someone someday will see it and find it useful.

                    Good luck.

                    Originally posted by John Bald View Post
                    I have a repeated measures data set and I would like to create a line graph of the mean values of test scores at each of seven timepoints for four groups:
                    • white males
                    • black males
                    • white females
                    • black females
                    which are coded as a pair of dummy variables (female, black) Y-value is testscore, X-value is month (1-7)

                    I could manually calculate the mean() of each group:
                    Code:
                    mean(testscore) if(black==0) & (female==0) & (month==1)
                    mean(testscore) if(black==0) & (female==0) & (month==2)
                    mean(testscore) if(black==0) & (female==0) & (month==1)
                    ...
                    ​​​​​​​mean(testscore) if(black==1) & (female==1) & (month==7)
                    but there has to be a programmatic way of doing this more elegantly.

                    ​​​​​​​

                    Comment


                    • #11
                      I fear that #10 is likely to dismay more readers than it helps. You really don't need to create a new dataset, reshape it and get back to your previous dataset after graphing. That's not so say that creating a dataset can't be good technique, because it certainly can be. But working with the same dataset is fine too.

                      The code in #2 works for the example given and so answers the question. It is more complicated than is needed to produce a graph because, in the absence of value labels in the data example in #1, work is put in to provide decent labels.

                      If the code in #2 doesn't work, the explanation is that your data in the dataset is out of order. That's possible and the fix as said in #5 is to add an extra sort option to the graph command. That does no harm in any case.

                      Comment


                      • #12
                        I sometimes use 'lgraph' programme for this purpose written by Timothy Mak. You may find the program involves less typing. To use the programme install it first:

                        Code:
                        ssc install lgraph
                        Then type the commands to create the plot:

                        Code:
                        lgraph testscore month group //where 'group' is coded from post #4
                        See help file for 'lgraph' after installation for other options [help lgraph].
                        Last edited by Roman Mostazir; 04 Sep 2020, 05:58. Reason: typo corrected
                        Roman

                        Comment


                        • #13
                          @12 Swings and roundabouts here. lgraph looks like a versatile command with quirky syntax that likely repays effort in working with it but may seem more complicated than a user wants to know. And yes, exactly the same could be said about some of my own commands.

                          Comment


                          • #14
                            I know this is an old post, but the collapse cmd could bare some help here. You could do the following:

                            Code:
                            collapse (mean) testscores, by(black female month)
                            From there produce one of many forms of a connected line graph.

                            Comment


                            • #15
                              Hello,

                              This thread was incredibly helpful. I was having a similar issue where I needed to create a line graph with repeated measures data stratified by three groups. With the helpful code provided in this thread, I was able to get those three line graphs created. However, I also want to graph my data among the entire sample. When I do that, I get this funky line graph (below) and I can't figure out why. The other line graphs had no issues.

                              For the stratified graphs, I used: twoway line bmipct_y1 year, sort by(anyuse)
                              It also works if I use: twoway line bmipct_y1 year if anyuse==0 || line bmipct_y1 year if anyuse==1 || line bmipct_y1 year if anyuse==2
                              I want to also use "twoway line bmipct_y1 year" at the beginning of the above code so that I can get a line among the entire sample, but that's what gets me the below graph.

                              Click image for larger version

Name:	Screen Shot 2021-09-30 at 2.49.53 PM.png
Views:	2
Size:	150.6 KB
ID:	1629763

                              My data is essentially the same as OP. See below.
                              caseid bmipct_y1 year anyuse
                              1 55 1 0
                              1 57 2 0
                              1 60 3 0
                              2 57 1 1
                              2 65 2 1
                              2 80 3 1
                              2 81 5 1
                              2 81 6 1
                              3 54 1 2
                              3 40 3 2
                              3 45 5 2
                              3 50 7 2

                              Any advice would be helpful!

                              Thank you.
                              Attached Files

                              Comment

                              Working...
                              X