Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reordering bars in hbar graph in descending order by values of a variable

    Hi Everyone,

    I have been in touch with the Stata technical support team and I'm a little worried I'm bumping up against the limits of Stata's graphing capabilities, but I thought before giving up I should see if I could get any guidance on here. I'll explain a little bit about what they said where it is relevant.

    I have a dataset of 1000 students across 15 classrooms that I created to roughly match my real data (the data I'm working with is confidential). The data is not cross-classified.

    This is how I created it:

    clear
    set obs 1000
    gen item1 = int(uniform() * 4) + 1
    replace item1 = . if item1 == 4
    gen item2 = int(uniform() * 5) + 1
    replace item2 = . if item2 == 4 | item2 == 5
    gen item3 = int(uniform() * 6) + 1
    replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
    gen item4 = int(uniform() * 7) + 1
    replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
    gen classid = int(uniform() * 15) + 1

    -- I know it's a cumbersome way to do it, but I wanted to make sure I had a dataset where the amount of missing data varied across items and respondents (like my real dataset).

    So, I have a series of survey items pertaining to students' experience, and for each item I would like to create a stacked horizontal bar graph showing the percentage of students within each classroom that responded to each of the three options of "No," "Maybe," and "Yes."

    It is important that I am able to arrange the bars by the percentage of students who responded a certain way in each classroom. For example, let's say the item is "Do you think you'll go to a ski resort this weekend?" I would want to sort each bar (representing one classroom) by the percentage of students who responded "Yes" to this item.

    Essentially I want to replicate the below excel-generated graph in Stata:


    Click image for larger version

Name:	Screen Shot 2017-01-10 at 12.08.53 PM.png
Views:	2
Size:	80.8 KB
ID:	1369949



    Here is the code I used to create the graph below:

    label define anslbl 1 "Yes" 2 "No" 3 "Maybe"

    label values item1 anslbl

    graph hbar, over(item1) over(classid) asyvars stack percentages ///
    blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
    bar(2, color(red)) bar(3, color(green))

    Here is the graph it generates:

    Click image for larger version

Name:	Screen Shot 2017-01-10 at 6.33.50 PM.png
Views:	1
Size:	96.6 KB
ID:	1369950


    I want to have the classrooms descend based on the percentage who responded "Yes" to the item.


    The Stata support team said the following when I asked about accomplishing a graph that matches the above excel:
    "There isn't an option that will allow you to order the bars based on one value of the stack. However, you may be able to sort the values first in your dataset and then create a new sort variable."

    I asked a follow-up question, but have yet to hear back. I also find the support to be hit-or-miss.

    In the meantime, I decided to try and create a variable that is equivalent to the percentage of students within each class who responded a certain way to an item.

    This is the code I tried:

    bysort classid: gen class_count = _N

    foreach x of varlist item1 item2 item3 item4 {
    tab `x', missing gen(`x'resp)
    }

    foreach i of numlist 4 1 2 3 {
    foreach x of varlist item1 item2 item3 item4 {
    bysort classid: gen `x'count_resp`i' = sum(`x'resp`i')
    bysort classid: replace `x'count_resp`i' = `x'count_resp`i'[_N]
    bysort classid: gen `x'count_resp`i'pct = ((`x'count_resp`i')/(class_count - `x'count_resp4))
    }
    }

    I ran tab `x', missing gen(`x'resp) so that I could subtract the missing values for a given variable from the total class count when computing the percentage so that it matches the percentages that are computed in the Stata graph. I checked the above code and it seems to give me what I want. I am now having trouble getting my graph to sort by one of these variables.

    When I try this:

    graph hbar, over(item1) over(classid, sort(item1count_resp3pct)) asyvars stack percentages ///
    blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
    bar(2, color(red)) bar(3, color(green))

    I get the following error:

    Click image for larger version

Name:	Screen Shot 2017-01-10 at 7.27.18 PM.png
Views:	2
Size:	18.7 KB
ID:	1369952


    I was trying to adapt the syntax from the section of "bar graph" help labeled "Putting the bars in a prespecified order."

    Any guidance would be much appreciated. Apologies if I'm missing something obvious.

    Thanks,
    Jake




    Attached Files

  • #2
    You can install the egenmore package from SSC (type in Stata ssc install egenmore). This will contain the egen function axis(), which is exactly designed for this purpose. Here is an adaptation of your example (thanks for the example, that makes it much easier to answer):

    Code:
    clear
    set obs 1000
    gen item1 = int(uniform() * 4) + 1
    replace item1 = . if item1 == 4
    gen item2 = int(uniform() * 5) + 1
    replace item2 = . if item2 == 4 | item2 == 5
    gen item3 = int(uniform() * 6) + 1
    replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
    gen item4 = int(uniform() * 7) + 1
    replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
    gen classid = int(uniform() * 15) + 1
    
    gen item1yes = (item1 == 1) if item1 < .
    bys classid : egen item1propyes = mean(item1yes)
    egen Classid = axis(item1propyes classid), label(classid)
    
    graph hbar, over(item1) over(Classid) asyvars stack percentages ///
    blabel(bar, position(inside) format(%4.2f)) bar(1, color(blue)) ///
    bar(2, color(red)) bar(3, color(green))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	20.6 KB
ID:	1369964
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Maarten kindly drew attention to the axis() function in egenmore (SSC). I tend to emphasise an equivalent solution, creating a variable containing desired category order and then labelling it using labmask (SJ). That's written up within http://www.stata-journal.com/sjpdf.h...iclenum=gr0034

      I have some extra comments beyond the question.

      1. Mixing red and green is a standard graphical no-no, as so many people have difficulty distinguishing those colours. There is much good advice on colour out there, e.g. http://colorbrewer2.org/#type=qualit...heme=Dark2&n=3 yields suggestions of a 3-colour scheme for qualitatively different categories.

      2. Putting the numerical values on the bars when the bars have strong colours makes them hard to read. This is one of several weaknesses of a stacked design. Putting the bars side-by-side is one alternative. See http://www.statalist.org/forums/foru...updated-on-ssc and http://www.stata-journal.com/article...article=gr0066 for the write-up.

      Here's a combined example. My random numbers will differ from those in the original, but note the technique of setting the seed.

      Code:
      clear
      set seed 2803 
      set obs 1000
      gen item1 = int(uniform() * 4) + 1
      replace item1 = . if item1 == 4
      gen item2 = int(uniform() * 5) + 1
      replace item2 = . if item2 == 4 | item2 == 5
      gen item3 = int(uniform() * 6) + 1
      replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
      gen item4 = int(uniform() * 7) + 1
      replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
      gen classid = int(uniform() * 15) + 1
      
      gen item1yes = (item1 == 1) if item1 < .
      bys classid : egen item1propyes = mean(item1yes)
      
      egen Classid = group(item1propyes classid) 
      labmask Classid, values(classid) 
      
      tabplot Classid item1, percent(Classid) showval(format(%2.0f) offset(0.5)) ///
      separate(item1)  bcolor("27 151 119" "217 95 2" "117 112 179") horiz ///
      ytitle(group) subtitle(percent)



      Click image for larger version

Name:	anothertabplot2.png
Views:	1
Size:	17.9 KB
ID:	1369967



      Comment


      • #4
        I just want to flag the most general point: getting any order you like for bars at worst means that you have to create your own variable defining that order.

        Comment


        • #5
          Thank you, Maarten & Nick for your helpful input!

          Nick, would it be correct that in my initial run I had created a variable from which I could create the bar order I wanted (`x'count_resp`i'pct), but I fell one step short by trying to sort by this variable instead of creating a sort variable?

          Thanks for the color tip. I was essentially just trying to mimic the example graph, but that is a really good point. I've changed the colors and have gotten rid of the bar labels. I agree that they are a bit tricky to read and it looks better without them.

          Is there a way to have each graph titled with a variable label? I have been working from the title portion of the Stata help graph page, but have yet to figure it out. You'll see what I'm trying to do below.

          Also, is there a way to stack the bars in a different order such that "Yes" appears farthest to the left without changing the values of the variables themselves (I was thinking that maybe I could reverse code things so that 1 is "Yes" and so on, but I would prefer not to)? I think in order to be comfortable with no bar labels it would be helpful to put the response category ("Yes") sorted by on the left so that the percentages on the axis match the bars.

          Also, I adjusted the numeric axis label to go by 10 percentage points. Is there a way in that statement, currently (0(10)100), to have the numbers display as percentages? If not I guess the axis label will have to suffice.

          This is where I am not now -- Maarten, I've adapted your very helpful code:

          Code:
          clear
          set obs 1000
          gen item1 = int(uniform() * 4) + 1
          replace item1 = . if item1 == 4
          gen item2 = int(uniform() * 5) + 1
          replace item2 = . if item2 == 4 | item2 == 5
          gen item3 = int(uniform() * 6) + 1
          replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
          gen item4 = int(uniform() * 7) + 1
          replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
          gen classid = int(uniform() * 15) + 1
          
          label var item1 "Did you plan on going to a ski-resort this weekend?"
          label var item2 "Do you want to be an astronaut when you grow up?"
          
          foreach x of varlist item1 item2 {
          label define anslbl`x'  3 "Yes" 2 "Maybe" 1 "No" 
          label values `x' anslbl`x'
          gen `x'yes = (`x' == 3) if `x' < .
          bys classid: egen `x'propyes = mean(`x'yes)
          egen Classid`x' = axis(`x'propyes classid), label(classid)
          
          graph hbar, over(`x') over(Classid`x', label(nolabels)) asyvars stack percentages ///
          bar(1, color(green)) bar(2, color(blue)) bar(3, color(yellow)) ylabel(0(10)100) ///
          ytitle("Student Response Percentage") title(`x') saving(`x'graph, replace) 
          }

          These are the graphs I get:

          Click image for larger version

Name:	Screen Shot 2017-01-11 at 1.24.18 PM.png
Views:	1
Size:	43.7 KB
ID:	1370035 Click image for larger version

Name:	Screen Shot 2017-01-11 at 1.24.45 PM.png
Views:	1
Size:	44.1 KB
ID:	1370036


          I feel that I'm super close to getting what I want thanks to your help, Maarten and Nick!

          Best,
          Jake


          Comment


          • #6
            You can access the variable labels with extended macro functions, see help extended_fcn. I never remember the name of that help-file, so I always type help macro and click on the link for extended macro functions.

            Code:
            clear
            set obs 1000
            set seed 123456
            gen item1 = int(uniform() * 4) + 1
            replace item1 = . if item1 == 4
            gen item2 = int(uniform() * 5) + 1
            replace item2 = . if item2 == 4 | item2 == 5
            gen item3 = int(uniform() * 6) + 1
            replace item3 = . if item3 == 4 | item3 == 5 | item3 == 6
            gen item4 = int(uniform() * 7) + 1
            replace item4 = . if item4 == 4 | item4 == 5 | item4 == 6 | item4 == 7
            gen classid = int(uniform() * 15) + 1
            
            label var item1 "Did you plan on going to a ski-resort this weekend?"
            label var item2 "Do you want to be an astronaut when you grow up?"
            
            foreach x of varlist item1 item2 {
                label define anslbl`x'  3 "Yes" 2 "Maybe" 1 "No"
                label values `x' anslbl`x'
                gen `x'yes = (`x' == 3) if `x' < .
                bys classid: egen `x'propyes = mean(`x'yes)
                egen Classid`x' = axis(`x'propyes classid), label(classid)
            
                graph hbar, over(`x') over(Classid`x')             ///
                            asyvars stack percentages              ///
                            bar(1, color(green ))                  ///
                            bar(2, color(blue  ))                  ///
                            bar(3, color(yellow))                  ///
                            ylabel(0(10)100)                       ///
                            ytitle("Student Response Percentage")  ///
                            l1title("Class")                        ///
                            title(`"`: variable label `x''"')      ///
                            name(`x'graph, replace)
            }
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              I'd add that variable labels tend to get used by default as axis labels in tabplot

              Comment


              • #8
                Thanks, Maarten!

                Nick, I didn't previously have tabplot so I installed it and will play with it. Thanks!

                Best,
                Jake

                Comment

                Working...
                X