Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Include mean and standard deviation in my histogram

    Hello everyone,

    I'm stuggeling to add (vertical) lines for the mean and the standard deviation to my histogram. Attached you find the histogram showing the physical attractiveness by gender and migration status. I think I should first create means for all six groups and then include them in the histogram command. Can anyone advise me in the best way to do this?

    This is my code to create the histogram:

    Code:
        egen mig_sex=group(sex migstatus)
            tab mig_sex
            table migstatus sex mig_sex
            lab def mig_sex 1 "Männlich: Keine Einwanderungsgeschichte" 2 "Erste Generation" 3 "Zweite Generation" ///
                            4 "Weiblich: Keine Einwanderungsgeschichte" 5 "Erste Generation" 6 "Zweite Generation"
            lab val mig_sex mig_sex
            tab migstatus sex
            fre mig_sex
    
        tab migstatus att if sex==0, chi2 V
        tab migstatus att if sex==1, chi2 V
        
        tab mig_sex if att!=.
        
        forvalues i=1/6    {
            sum mig_sex if mig_sex==`i' & att!=.
            local n`i' = r(N)
        }
        
        gen double wt=round(cdweight*100)
        
                histogram att [fweight=wt], by(mig_sex, row(2) title("Verteilung der physischen Attraktivität, erste Welle: Einwanderungsgeschichte" " ")                                ///
                                       note(`"{bf:Männlich}: Keine Einwanderungsgeschichte N = `n1', Erste Generation N = `n2', Zweite Generation N = `n3'"'                                ///
                                            `"{bf:Weiblich}: Keine Einwanderungsgeschichte N = `n4', Erste Generation N = `n5', Zweite Generation N = `n6'"', size(vsmall)) legend(off))    ///
                                    percent discrete name(mig_hist, replace)                                ///
                                    addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))    ///
                                    yscale(r(0(10)40)) xscale(r(1(1)7))                                     ///
                                    xlabel(1 "Sehr unattraktiv" 2 3 4 5 6 7 "Sehr attraktiv", angle(45))    ///
                                    xtitle("Physische Attraktivität" " ")                                    ///
                                    scheme(plotplainblind)
    This is what the histogram looks like at the moment:

    Click image for larger version

Name:	mig_hist.jpg
Views:	1
Size:	92.7 KB
ID:	1765208

    Online I found some code to add vertical lines for mean and SD. I'm able to recreate a simple histogram with the lines but I'm struggling to implement it in my more complicate (loop) case.

    Code:
    summarize att
    local m=r(mean)
    local sd=r(sd)
    local low = `m'-`sd'
    local high=`m'+`sd'
    
    histogram att, ///
    fc(none) lc(green) xline(`m') ///
    xline(`low', lc(blue)) xline(`high', lc(blue)) scale(0.5) ///
    text(0.12 `m' `"mean = $`=string(`m',"%6.2f")'"', ///
    color(red) orientation(vertical) placement(2))
    
    addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))
    Thank you in advance!

  • #2
    See https://journals.sagepub.com/doi/pdf...6867X241276116 for a miniature review of adding lines. However, I would lean towards a different approach for your case.

    In essence, vertical lines for means (+/- SDs) are all too likely to interfere not only with the histogram bars but also with the bar labels.

    Here is a different approach. I can't use your data.

    Code:
    sysuse auto, clear 
    
    egen mean = mean(mpg), by(rep78)
    egen SD = sd(mpg), by(rep78)
    gen upper = mean + SD
    gen lower = mean - SD 
    gen where = -3
    
    histogram mpg, by(rep78, legend(off) caption(Bars show mean {&plusminus} SD) ///
    note("Repair record 1978", size(medsmall) pos(11))) ///
    percent start(10) width(3) fcolor(stc1*0.2) addlabels addlabopts(mlabformat(%1.0f)) ///
    addplot(rcap upper lower where, horizontal pstyle(p2) || scatter where mean, pstyle(p2)) ///
    ytitle(Percent) xtitle("`: var label mpg'") yla(0(10)50) ysc(r(-6 55))
    Click image for larger version

Name:	histobar.png
Views:	1
Size:	54.1 KB
ID:	1765223

    Comment


    • #3
      Thanks a lot for you helpful answer! I managed to adapted it to my data (see picture). However I'm not able to move the x axis a bit down and to include the values to the variable "where" (mean) in the plot. Do you have an idea how to do so?

      Click image for larger version

Name:	mig_hist.jpg
Views:	1
Size:	100.7 KB
ID:	1765231

      This is the code and data I used:

      Code:
          egen mean = mean(att), by(mig_sex)
      egen SD = sd(att), by(mig_sex)
      gen upper = mean + SD
      gen lower = mean - SD 
      gen where = -3
          
          forvalues i=1/6    {
              sum mig_sex if mig_sex==`i' & att!=.
              local n`i' = r(N)
          }
      
      histogram att, by(mig_sex, row(2) title("Verteilung der physischen Attraktivität, erste Welle: Einwanderungsgeschichte" " ") caption(Bars show mean {&plusminus} SD) ///
      note(`"{bf:Männlich}: Keine Einwanderungsgeschichte N = `n1', Erste Generation N = `n2', Zweite Generation N = `n3'"'                                ///
          `"{bf:Weiblich}: Keine Einwanderungsgeschichte N = `n4', Erste Generation N = `n5', Zweite Generation N = `n6'"', size(vsmall)) legend(off)) ///
      percent discrete name(mig_hist, replace)                                ///
      addplot(rcap upper lower where, horizontal pstyle(p2) || scatter where mean, pstyle(p2)) ///
                                      addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))    ///                            
                                      yscale(r(0(10)40)) xscale(r(1(1)7))                                     ///
                                      xlabel(1 "Sehr unattraktiv" 2 3 4 5 6 7 "Sehr attraktiv", angle(45))    ///
                                      xtitle("Physische Attraktivität" " ")                                    ///
                                      scheme(plotplainblind)
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float att byte(sex migstatus) float mig_sex
      2 0 1 1
      6 0 1 1
      6 0 1 1
      6 0 1 1
      5 0 1 1
      5 0 1 1
      5 0 1 1
      7 0 1 1
      6 0 1 1
      7 0 1 1
      7 0 1 1
      6 0 1 1
      5 0 1 1
      7 0 1 1
      5 0 1 1
      4 0 1 1
      3 0 1 1
      7 0 1 1
      7 0 1 1
      7 0 1 1
      5 0 1 1
      5 0 1 1
      4 0 1 1
      6 0 1 1
      6 0 1 1
      7 0 1 1
      5 0 1 1
      6 0 1 1
      4 0 1 1
      5 0 1 1
      6 0 1 1
      4 0 1 1
      6 0 1 1
      5 0 1 1
      5 0 1 1
      7 0 1 1
      7 0 1 1
      6 0 1 1
      6 0 1 1
      2 0 1 1
      7 0 1 1
      5 0 1 1
      4 0 1 1
      6 0 1 1
      7 0 1 1
      4 0 1 1
      5 0 1 1
      7 0 1 1
      7 0 1 1
      6 0 1 1
      7 0 1 1
      2 0 1 1
      6 0 1 1
      6 0 1 1
      7 0 1 1
      6 0 1 1
      6 0 1 1
      4 0 1 1
      5 0 1 1
      7 0 1 1
      6 0 1 1
      6 0 1 1
      5 0 1 1
      7 0 1 1
      5 0 1 1
      6 0 1 1
      5 0 1 1
      6 0 1 1
      4 0 1 1
      6 0 1 1
      4 0 1 1
      4 0 1 1
      6 0 1 1
      4 0 1 1
      4 0 1 1
      6 0 1 1
      6 0 1 1
      4 0 1 1
      6 0 1 1
      5 0 1 1
      2 0 1 1
      7 0 1 1
      4 0 1 1
      4 0 1 1
      6 0 1 1
      4 0 1 1
      7 0 1 1
      4 0 1 1
      6 0 1 1
      5 0 1 1
      7 0 1 1
      3 0 1 1
      3 0 1 1
      5 0 1 1
      7 0 1 1
      4 0 1 1
      7 0 1 1
      6 0 1 1
      5 0 1 1
      4 0 1 1
      end
      label values att att
      label def att 7 "7. Sehr attraktiv", modify
      label values sex sex
      label def sex 0 "0. Männlich", modify
      label values migstatus migstatus
      label def migstatus 1 "Keine Einwanderungsgeschichte", modify
      label values mig_sex mig_sex
      label def mig_sex 1 "Männlich: Keine Einwanderungsgeschichte", modify

      Comment


      • #4
        Your code confuses yscale() and ylabel() -- and also the equivalent options for the x axis.

        My yscale() stretches the range to [-6, 50] to accommodate the horizontal bars at -3.

        Comment


        • #5
          Thanks a lot Nick. I have managed to extend the range of the y-scale.

          The only thing I am still struggling with is including the means of each group in the note. I managed to save the means correctly in the macros, but they show up as "." in the graph, as highlighted in yellow in the image. Does anyone have any idea what I'm doing wrong?

          Click image for larger version

Name:	mig_hist.jpg
Views:	1
Size:	48.6 KB
ID:	1765289


          This is my code:

          Code:
          egen mean = mean(att), by(mig_sex)
          egen SD = sd(att), by(mig_sex)
          gen upper = mean + SD
          gen lower = mean - SD 
          gen where = -3
              
              forvalues i=1/6    {
                  sum mig_sex if mig_sex==`i' & att!=.
                  mean att if mig_sex == `i' & att!=.
                  local mean`i' = r(mean)
                  local n`i' = r(N)
              }
          
          histogram att, by(mig_sex, row(2) title("Verteilung der physischen Attraktivität, erste Welle: Einwanderungsgeschichte" " ") caption(Balken zeigt den Mittelwert {&plusminus} Standard-Abweichung, size (vsmall)) ///
          note(`"{bf:Männlich}: Keine Einwanderungsgeschichte N = `n1', Mittelwert = `mean1'; Erste Generation N = `n2', Zweite Generation N = `n3'"'                                ///
              `"{bf:Weiblich}: Keine Einwanderungsgeschichte N = `n4', Erste Generation N = `n5', Zweite Generation N = `n6'"', size(vsmall)) legend(off)) ///
          percent discrete name(mig_hist, replace)                                ///
          addplot(rcap upper lower where, horizontal pstyle(p2) || scatter where mean, pstyle(p2)) ///
                                          addlabel addlabopts(mlabposition(12) mlabformat(%4.1f) mlabsize(vsmall))    ///                            
                                          yscale(r(-6(10)50)) xscale(r(1(1)7))                                     ///
                                          xlabel(1 "Sehr unattraktiv" 2 3 4 5 6 7 "Sehr attraktiv", angle(45))    ///
                                          xtitle("Physische Attraktivität" " ")                                    ///
                                          scheme(plotplainblind)
          The local macos are correctly saves:

          Code:
          . forvalues i = 1/6 {
            2.     display "Finale Werte für Gruppe `i': Mittelwert = `mean`i'' | Anzahl = `n`i''"
            3. }
          Finale Werte für Gruppe 1: Mittelwert = 5.280324074074074 | Anzahl = 4320
          Finale Werte für Gruppe 2: Mittelwert = 5.256186317321689 | Anzahl = 687
          Finale Werte für Gruppe 3: Mittelwert = 5.259780907668231 | Anzahl = 639
          Finale Werte für Gruppe 4: Mittelwert = 5.563986409966025 | Anzahl = 4415
          Finale Werte für Gruppe 5: Mittelwert = 5.509090909090909 | Anzahl = 825
          Finale Werte für Gruppe 6: Mittelwert = 5.641025641025641 | Anzahl = 741

          Comment


          • #6
            See

            Code:
            SJ-21-4 gr0090  . . . .  Adding variable text to graphs that use a by() option
                    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                    Q4/21   SJ 21(4):1074--1080                              (no commands)
                    tip explains two alternative methods for adding text that
                    varies informatively to graphs that use a by() option
            Code:
            search gr0090, entry

            to get a clickable link.

            Comment


            • #7
              Thank you Nick, it helped a lot!

              Comment


              • #8
                The graphs you give show different results. I have some other suggestions but to apply I need please you to show the results of

                Code:
                contract att mig_sex 
                dataex

                Comment


                • #9
                  Sorry for not getting back to you sooner. The different results are due to the fact that I have not included the weighting variable in the second graph.

                  Code:
                  * Example generated by -dataex-. For more info, type help dataex
                  clear
                  input float(att mig_sex) int _freq
                  1 1   31
                  1 2    6
                  1 3    5
                  1 4   46
                  1 5   10
                  1 6   10
                  1 .    6
                  2 1  128
                  2 2   14
                  2 3   24
                  2 4  142
                  2 5   26
                  2 6   28
                  2 .   10
                  3 1  301
                  3 2   42
                  3 3   46
                  3 4  248
                  3 5   53
                  3 6   34
                  3 .   11
                  4 1  684
                  4 2  117
                  4 3   89
                  4 4  436
                  4 5   86
                  4 6   70
                  4 .   38
                  5 1 1054
                  5 2  185
                  5 3  160
                  5 4  846
                  5 5  166
                  5 6  117
                  5 .   64
                  6 1 1239
                  6 2  203
                  6 3  191
                  6 4 1362
                  6 5  238
                  6 6  227
                  6 .   69
                  7 1  883
                  7 2  120
                  7 3  124
                  7 4 1335
                  7 5  246
                  7 6  255
                  7 .   64
                  end
                  label values att att
                  label def att 1 "1. Sehr unattraktiv", modify
                  label def att 7 "7. Sehr attraktiv", modify
                  label values mig_sex mig_sex
                  label def mig_sex 1 "Männlich: Keine Einwanderungsgeschichte", modify
                  label def mig_sex 2 "Erste Generation", modify
                  label def mig_sex 3 "Zweite Generation", modify
                  label def mig_sex 4 "Weiblich: Keine Einwanderungsgeschichte", modify
                  label def mig_sex 5 "Erste Generation", modify
                  label def mig_sex 6 "Zweite Generation", modify

                  Comment


                  • #10
                    Thanks for the data example.

                    I had various ideas chiefly as written up within my presentation at https://www.stata.com/meeting/uk21/ but none seem to help much, so I won't show those disappointing results.

                    Your histograms show the issue as clearly as anything else: the distributions are all very similar, except that as shown by the means, women tend to give higher scores than men.

                    Comment


                    • #11
                      Thank you very much for your effort!

                      Comment

                      Working...
                      X