Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Report mean and N inside bars with - hbar - using the - stack - command?

    Dear Stata-listers
    I am wondering if there is a way to have both N and mean value reported inside bars, using - hbar - with the stack-command? So that for example, the text string shown inside each stacked bar would be "(n=51/m=3.5)" or something similar? Below I have added some code which takes me part of the way, but the bars are not stacked and N is to the left of the bar, not inside the bar. I would like the bars to be stacked over the groups Domestic, Foreign and Total. I have added the "Total" category in the data below.

    In advance, thank you for any help and suggestions.

    Best wishes,
    Johanne

    Code:
    sysuse auto, clear
    keep  foreign mpg trunk turn
    
    replace foreign=2 if foreign !=.
                
    label define origin 2 "Total", add
    
    save auto2, replace
                
    append using "C:\Program Files (x86)\Stata14\ado\base\a\auto.dta"
                
    keep  foreign mpg trunk turn
    
    collapse  ///
    (count) countmpg=mpg ///
    (count) counttrunk=trunk  ///
    (count) countturn=turn  ///
    (mean) meanmpg=mpg    ///
    (mean) meantrunk=trunk   ///
    (mean) meanturn=turn   ///
    , by(foreign)
    
    browse
    
    reshape long count mean, i(foreign) j(which) string
    
    capture drop order2
    capture drop group2
    capture drop detail2
    
    label define order2 1 mpg  2 trunk 3 turn  
    
    encode which, gen(order2) label(order2)
    
    label def order2 1 " ", modify
    label def order2 2 " ", modify
    label def order2 3 " ", modify
    
    gen detail2 = " ({it:n} = " + string(count) + ")"  
    
    egen group2 = group(order2 detail2), label
    
    separate mean, by(order)
    
    #delimit ;
    graph hbar (asis) mean?, over(group2, label(labsize(small)))  over(foreign, label(labsize(small))) nofill
    bar(1, fcolor(eggshell) lcolor(black)) bar(2, fcolor(khaki) lcolor(black))bar(3, fcolor(olive_teal)lcolor(black))
    blabel(bar, pos(inside) format(%12.1f) size(vsmall))
        legend(lab(1 "MPG") lab(2 "Trunk")lab(3 "Turn") size(vsmall)
        keygap(0.5) symxsize(5) col(6) position(-2) ring(-1))
        plotregion(lcolor(none))
        scheme(s1mono)
        title("Some variables" " ", size(medlarge) )
        ytitle(" ")
        note("Note: Some note", size(small) span)
        ylab(none)                                                                                                                                        
        name(fig15statalist2, replace);
        graph save fig15statalist2, replace;
    #delimit cr


    This results in this graph, which is not stacked and which has N reported. Never mind that in this case N is the same for each variable within each group of respondents, it is the general idea which is important.

    My problem is how can I stack the bars over the groups and have N reported inside the bar, alongside the mean value?

    Click image for larger version

Name:	fig15statalist2.png
Views:	1
Size:	28.9 KB
ID:	1394303

  • #2
    Why would you want to stack mpg, trunk and turn? mpg is in units of miles per gallon, trunk in cubic feet, etc. These are 3 distinct variables and not categories of the same variable. A better illustration would have been to provide example data based on your desired graphs in your previous post


    http://www.statalist.org/forums/foru...alues-reported

    In the case of distribution of weekly work tasks, activity "A" will take x hours, "B" y hours, etc. (same unit). Stacking may thus make sense. Assume


    Code:
    input float(hours level activity)
    100 1 1 
     50 1 1 
    200 1 1 
     75 1 2 
    125 1 2 
     50 1 2 
     60 1 2 
    140 1 3 
     40 1 3 
    100 1 3 
     60 1 3 
    200 1 3 
     50 2 1 
    100 2 1 
    100 2 1 
    150 2 1 
     75 2 2 
    125 2 2 
    120 2 2 
     70 2 2 
     80 2 2 
    200 2 3 
     30 2 3 
    100 2 3 
    end
    label def lev 1 "Bachelor" 2 "Master"
    label def actv 1 "Teaching" 2 "Research" 3 "Other"
    label values level lev
    label values activity actv
    With this structure:


    Code:
    bys level: egen count= count(1)
    gen detail = " ({it:n} = " + string(count) + ")"
    egen group = group (level detail), label
    
    graph hbar hours, over(activity)  over(group) bar(1, fcolor(eggshell) lcolor(black)) bar(2, fcolor(khaki) lcolor(black)) ///
    bar(3, fcolor(olive_teal)lcolor(black)) blabel(bar, pos(base)) legend(lab(1 "Teaching") lab(2 "Research")lab(3 "Other") ///
    size(vsmall)) title("Distribution of weekly work tasks (%)" " ", size(medlarge) ) plotregion(lcolor(none)) ytitle(" ") /// 
    note("Note: Some note", size(small) span) asyvars stack percent


    Click image for larger version

Name:	stack.png
Views:	1
Size:	6.9 KB
ID:	1394443

    Comment


    • #3
      Why would you want to stack mpg, trunk and turn? mpg is in units of miles per gallon, trunk in cubic feet, etc. These are 3 distinct variables and not categories of the same variable. A better illustration would have been to provide example data based on your desired graphs in your previous post
      Thank you for commenting and for helping me, Andrew Musau. Well, I was a bit short in this post and I can see why you ask this. The thing is, in my previous post (which I received no replies for and therefore gave up on, as I figured I perhaps had not made myself clear or violated a rule of the Statalist Forum), I mentioned that my use of auto.data in this case is just to illustrate. My own data consists of groups of variables belonging to the same subject and with the same scale (a 5-point Likert scale). The groups of related variables are to be analysed both individually (i.e. as single variables, and this is where stacking is what I need) and as scales/indexes (i.e. as approximations to some latent construct, in which case stacking is not necessary).

      In my original post (which, for new readers, can be found here: http://www.statalist.org/forums/foru...=1495758286147) I asked two questions, and I thought maybe this was too much and that could be why I didn't hear from anyone. I tried to delete the old post, but that was no longer possible, as unfortunately, there is a time limit to how long after posting one may delete the post. I am so sorry for the confusion this has caused.

      Anyway, I can see that you have suggested a solution to my problem and I will run it tomorrow at work. Currently it is 03:00am here in Oslo, Norway, so I should probably go to sleep now.

      Best wishes,
      Johanne

      Comment


      • #4
        I could not sleep because I was so excited about your code, Andrew Musau. I simply had to get up and test it. :-P

        And it works great, so thank you so much! :-)

        I will try running it on my own data tomorrow, but for now I created the "Total" group and aslso added some minor visual changes to the graph (one decimal point and a monochrome scheme). For the record, in case it might come in handy for other Statalist-users, here is the code after I added these things:

        Code:
        clear
        input float(hours level activity)
        100 1 1
         50 1 1
        200 1 1
         75 1 2
        125 1 2
         50 1 2
         60 1 2
        140 1 3
         40 1 3
        100 1 3
         60 1 3
        200 1 3
         50 2 1
        100 2 1
        100 2 1
        150 2 1
         75 2 2
        125 2 2
        120 2 2
         70 2 2
         80 2 2
        200 2 3
         30 2 3
        100 2 3
        end
        
        label def lev 1 "Bachelor" 2 "Master"
        label def actv 1 "Teaching" 2 "Research" 3 "Other"
        label values level lev
        label values activity actv
        
        save "C:\Users\Hilde\Desktop\fig15_1.dta", replace
        
        replace level=3 if level !=.
                    
        label define lev 3 "Total", add
        
        save fig15_2.dta, replace
                    
        append using "C:\Users\Hilde\Desktop\fig15_1.dta"
        
        browse
        
        bys level: egen count= count(1)
        gen detail = " ({it:n} = " + string(count) + ")"
        egen group = group (level detail), label
        
        graph hbar hours, over(activity)  over(group) ///
        bar(1, fcolor(eggshell) lcolor(black)) bar(2, fcolor(khaki) lcolor(black)) bar(3, fcolor(olive_teal)lcolor(black)) ///
        blabel(bar, pos(base) format(%12.1f) size(vsmall)) ///
        legend(lab(1 "Teaching") lab(2 "Research")lab(3 "Other") size(vsmall) ///
        keygap(0.5) symxsize(5) col(3) position(-2) ring(-1)) ///
        title("Distribution of weekly work tasks (%)" " ", size(medlarge)) ///
        plotregion(lcolor(none)) ///
        scheme(s1mono) ///
        ytitle(" ") ///
        note("Note: Some note", size(small) span) ///
        asyvars stack percent
        Click image for larger version

Name:	graph stacked_N and percent.png
Views:	1
Size:	46.7 KB
ID:	1394834


        Last edited by Johanne Karlsen; 25 May 2017, 20:36.

        Comment


        • #5
          Hi Johanne: Glad to see that this worked out for you. I usually find it more efficient to work with temporary files if not making use of the data beyond a few tasks. Therefore, instead of saving external files (which compile over time), you can just preserve the original dataset, create a temporary holding file for the plot and then restore


          Code:
          clear
          input float(hours level activity)
          100 1 1
           50 1 1
          200 1 1
           75 1 2
          125 1 2
           50 1 2
           60 1 2
          140 1 3
           40 1 3
          100 1 3
           60 1 3
          200 1 3
           50 2 1
          100 2 1
          100 2 1
          150 2 1
           75 2 2
          125 2 2
          120 2 2
           70 2 2
           80 2 2
          200 2 3
           30 2 3
          100 2 3
          end
          
          label def lev 1 "Bachelor" 2 "Master"
          label def actv 1 "Teaching" 2 "Research" 3 "Other"
          label values level lev
          label values activity actv
          preserve
          tempfile original
          save `original'
          replace level=3 if level !=.
          label define lev 3 "Total", add
          append using `original'
          bys level: egen count= count(1)
          gen detail = " ({it:n} = " + string(count) + ")"
          egen group = group (level detail), label
          
          graph hbar hours, over(activity)  over(group) ///
          bar(1, fcolor(eggshell) lcolor(black)) bar(2, fcolor(khaki) lcolor(black)) bar(3, fcolor(olive_teal)lcolor(black)) ///
          blabel(bar, pos(base) format(%12.1f) size(vsmall)) ///
          legend(lab(1 "Teaching") lab(2 "Research")lab(3 "Other") size(vsmall) ///
          keygap(0.5) symxsize(5) col(3) position(-2) ring(-1)) ///
          title("Distribution of weekly work tasks (%)" " ", size(medlarge)) ///
          plotregion(lcolor(none)) ///
          scheme(s1mono) ///
          ytitle(" ") ///
          note("Note: Some note", size(small) span) ///
          asyvars stack percent 
          restore

          Comment


          • #6
            [...] instead of saving external files (which compile over time), you can just preserve the original dataset, create a temporary holding file for the plot and then restore
            Thank you for this suggestion, Andrew Musau ! I very much agree with you on the matter of temp files, so your code is very handy and I will definetely add it to my pool of useful commands. And thank you so much for helping me with this graph. I've got so many different types of variables and graphs to make, so I know I'll need to ask other Stata-users more questions in the future, but for now I'll keep working on the ones I can make from this code :-)

            Best wishes,
            Johanne

            Comment

            Working...
            X