Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graph with mean attendance and variation in attendance to several training sessions, split by groups

    Dear Statalisters,

    I would love it if someone could help me solve the issue below. I try to give as much information as possible and proceed step by step, but please do let me know if there is something else that I should be providing (it is my first post!).

    First, I am trying to generate a graph that has in the y-axis the attendance (in percentages) to various training sessions, and in the x-axis the training sessions themselves (all trainings, T1, T2, T3, T4). I do this using the code below:

    twoway connected avg_percentattendedt T if T==0 || rcap hi_percentattendedt lo_percentattendedt T if T==0 /// * all trainings
    || connected avg_percentattendedt T if T==1 || rcap hi_percentattendedt lo_percentattendedt T if T==1 /// * T1
    || connected avg_percentattendedt T if T==2 || rcap hi_percentattendedt lo_percentattendedt T if T==2 /// *T2
    || connected avg_percentattendedt T if T==3 || rcap hi_percentattendedt lo_percentattendedt T if T==3 /// *T3
    || connected avg_percentattendedt T if T==4 || rcap hi_percentattendedt lo_percentattendedt T if T==4 /// *T4
    , legend(order( 1 "All T mean" 2 "All T hi/low" 3 "T1 mean" 4 "T1 hi/low" 5 "T2 mean" 6 "T2 hi/low" 7 "T3 mean" 8 "T3 hi/low" 9 "T4 mean" 10 "T4 hi/low") pos(6) rows(5)) xlab(0 "All" 1 "T1" 2 "T2" 3 "T3" 4 "T4") ///
    ytitle("%", height(10)) ylabel(55(5)80) xtitle("Treatment")



    However, the attendants to the training sessions can be of 3 different types (say mg_level 1, mg_level 2, mg_level 3). I would like to reproduce the same graph as above with the distinction that for each point in the x-axis (i.e. each training) I would like the mean and variation for the three groups.

    The data is initially in wide format and I have the percentage attendance variables without making distinction across groups. I proceed to create the variables by managerial level with the code below. In the code, I also collapse the data and reshape to long format as to end up with a dataset consisting of three observations (one for each managerial level), and variables "T avg_percentattendedt0 hi_percentattendedt0 lo_percentattendedt0 avg_percentattendedt1 hi_percentattendedt1 lo_percentattendedt1 avg_percentattendedt2 hi_percentattendedt2 lo_percentattendedt2 avg_percentattendedt3 hi_percentattendedt3 lo_percentattendedt3 avg_percentattendedt4 hi_percentattendedt4 lo_percentattendedt4". T is equal to 1,2,3 for obs 1, 2, and 3 respectively, and distinguishes between the groups.

    global Var percentattendedt0 percentattendedt1 percentattendedt2 percentattendedt3 percentattendedt4

    foreach y of varlist $Var {

    forval i = 1/3 {

    if `i' == 1 {

    su `y' if keyattendant == 1 & mg_level == `i'
    scalar mean_`y'`i' = r(mean)
    scalar n_`y'`i' = r(N)
    scalar sd_`y'`i' = r(sd)

    egen avg_`y'`i' = mean(`y') if keyattendant == 1 & mg_level == `i'

    gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
    gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))

    }

    if `i' == 2 {

    su `y' if keyattendant == 1 & mg_level == `i'
    scalar mean_`y'`i' = r(mean)
    scalar n_`y'`i' = r(N)
    scalar sd_`y'`i' = r(sd)

    egen avg_`y'`i' = mean(`y') if keyattendant == 1 & mg_level == `i'

    gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
    gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))

    }

    if `i' == 3 {

    su `y' if mg_level == `i'
    scalar mean_`y'`i' = r(mean)
    scalar n_`y'`i' = r(N)
    scalar sd_`y'`i' = r(sd)

    egen avg_`y'`i' = mean(`y') if mg_level == `i'

    gen hi_`y'`i' = avg_`y'`i' + invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
    gen lo_`y'`i' = avg_`y'`i' - invttail(n_`y'`i'-1,0.025)*(sd_`y'`i' / sqrt(n_`y'`i'))
    }
    }
    }


    collapse (mean) avg_percentattendedt01 hi_percentattendedt01 lo_percentattendedt01 ///
    avg_percentattendedt02 hi_percentattendedt02 lo_percentattendedt02 ///
    avg_percentattendedt03 hi_percentattendedt03 lo_percentattendedt03 ///
    avg_percentattendedt11 hi_percentattendedt11 lo_percentattendedt11 ///
    avg_percentattendedt12 hi_percentattendedt12 lo_percentattendedt12 ///
    avg_percentattendedt13 hi_percentattendedt13 lo_percentattendedt13 ///
    avg_percentattendedt21 hi_percentattendedt21 lo_percentattendedt21 ///
    avg_percentattendedt22 hi_percentattendedt22 lo_percentattendedt22 ///
    avg_percentattendedt23 hi_percentattendedt23 lo_percentattendedt23 ///
    avg_percentattendedt31 hi_percentattendedt31 lo_percentattendedt31 ///
    avg_percentattendedt32 hi_percentattendedt32 lo_percentattendedt32 ///
    avg_percentattendedt33 hi_percentattendedt33 lo_percentattendedt33 ///
    avg_percentattendedt41 hi_percentattendedt41 lo_percentattendedt41 ///
    avg_percentattendedt42 hi_percentattendedt42 lo_percentattendedt42 ///
    avg_percentattendedt43 hi_percentattendedt43 lo_percentattendedt43

    gen A = 1

    reshape long avg_percentattendedt0 avg_percentattendedt1 avg_percentattendedt2 avg_percentattendedt3 avg_percentattendedt4 ///
    hi_percentattendedt0 hi_percentattendedt1 hi_percentattendedt2 hi_percentattendedt3 hi_percentattendedt4 ///
    lo_percentattendedt0 lo_percentattendedt1 lo_percentattendedt2 lo_percentattendedt3 lo_percentattendedt4, i(A) j(T)



    My best attempt to create the graph I need has taken me as far as this (see below). Unless I have misunderstood, the twoway command does not admit the over option, which I think is a main reason why I am getting stuck.

    twoway connected avg_percentattendedt0 T || rcap hi_percentattendedt0 lo_percentattendedt0 T ///
    || connected avg_percentattendedt1 T || rcap hi_percentattendedt1 lo_percentattendedt1 T ///
    || connected avg_percentattendedt2 T || rcap hi_percentattendedt2 lo_percentattendedt2 T ///
    || connected avg_percentattendedt3 T || rcap hi_percentattendedt3 lo_percentattendedt3 T ///
    || connected avg_percentattendedt4 T || rcap hi_percentattendedt4 lo_percentattendedt4 T
    , legend(order( 1 "All T mean" 2 "All T hi/low" 3 "T1 mean" 4 "T1 hi/low" 5 "T2 mean" 6 "T2 hi/low" 7 "T3 mean" 8 "T3 hi/low" 9 "T4 mean" 10 "T4 hi/low") pos(6) rows(5)) xlab(0 "All" 1 "T1" 2 "T2" 3 "T3" 4 "T4") ///
    ytitle("%", height(10)) ylabel(55(5)80) xtitle("Treatment")

  • #2
    Welcome to the Stata Forum / Statalist.

    Please read the FAQ. There you'll find advice on sharing data/command/output.

    That being said, and considering a) I may have misunderstood the query; b) there is no toy example to work on: you may use - tsset T - then work with - tsline - command in order to achieve the desired graph.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      Hi Marcos,

      Thank you for your response. I think you misunderstood my query, and it is my mistake. Please see an example dataset below. This is an example fake dataset that looks like the one I am working with.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(T pct_all pct_all_hi pct_all_lo pct_T1 pct_T1_hi pct_T1_lo pct_T2 pct_T2_hi pct_T2_lo pct_T3 pct_T3_hi pct_T3_lo pct_T4 pct_T4_hi pct_T4_lo)
      1 34.88717 39.88717 29.88717 2.855687 7.855687 -2.1443133  7.110509  12.11051 2.1105094  87.5991  92.5991   82.5991  58.44658 63.44658  53.44658
      2 26.68857 31.68857 21.68857 86.89333 91.89333   81.89333 32.336796 37.336796 27.336796 20.47095 25.47095 15.470947 36.977913 41.97791 31.977913
      3 13.66463 18.66463  8.66463 35.08549 40.08549   30.08549  55.51032  60.51032  50.51032 89.27586 94.27586  84.27586  85.06309 90.06309  80.06309
      end
      Each line has the mean percentage of attended trainings and the variation (upward, and downward; "hi", and "lo"). The variable T with values 1, 2, and 3, identifies three different groups in my data.

      My goal is to generate a graph that has:
      - In the x-axis "All" "T1" "T2" "T3" "T4",
      - In the y-axis the attendance for the various trainings

      In this graph, I want to combine the mean values for each training with the hi/lo variables showing the variation (e.g. 1st) pct_T1 as a point in the graph, and 2nd) pct_T1_hi and pct_T1_lo as a range plot with capped spikes using rcap). The task has the additional complication that for each "level" of the x-axis, I want the range plot and mean for the 3 groups identified by the variable T separately (as shown below).
      Click image for larger version

Name:	graph format.PNG
Views:	1
Size:	29.0 KB
ID:	1495254


      Please let me know if anything is not clear! I am happy to provide more information.

      Comment


      • #4
        Thank you for clarifying your query.

        I’m not with ‘my’ Stata at this very moment, hence I cannot provide an example.

        That said, I recommend you type - help serrbar - and I believe this command will cope with the matter.Hopefully that helps.
        Best regards,

        Marcos

        Comment

        Working...
        X