Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing variables for a bar graph

    Question: How do I obtain four and six-year graduation rates, by "start year" (basically the year the students first entered university)

    Background: I have a discrete dataset, with time period variables (t=1, t=2, t=3), with a variable EventGrad (1=Graduated, 0 otherwise). What I'd to do is create a bar graph (by year of first registration). To get an idea of whether I was on the right track, I first performed the following two commands, using the year 2007 as an example

    tab EventGrad if _period==3 & Year==2007
    tab EventGrad if _period==4 & Year==2007

    I then tried the following graph:

    graph bar EventGrad if _period==3|_period==4, over(Year) asyvars blabel(bar, format(%9.2f)) percent

    However, that didn't seem to work. Basically, I want to add up the "1s" in period 3 and 4, and express that as a percentage of the total number of students who started in 2007.




  • #2
    No-one?

    Comment


    • #3
      It's essentially impossible to imagine what your dataset looks like from your description. Take a look at the following and see whether you can wrangle your dataset into that structure.
      Code:
      version 13.1
      
      clear *
      set more off
      set seed `=date("2014-05-05", "YMD")'
      
      tempfile tmpfil0
      quietly save `tmpfil0', emptyok
      forvalues i = 2000/2010 {
          drop _all
          quietly set obs 1
          generate int matriculation_yr = `i'
          generate int matriculation_total = 4000 + floor(1000 * runiform())
          quietly expand = 4
          generate byte attendance_yr = 2 + _n
          generate int graduated = ///
              floor(matriculation_total * (0.025 + 0.025 * runiform()))
          quietly replace graduated = ///
              floor((matriculation_total - graduated[1]) * ///
                  (0.4 + 0.4 * runiform())) in 2
          quietly replace graduated = max(0, ///
              floor((matriculation_total - graduated[1] - graduated[2]) * ///
                  (0.2 + 0.2 * runiform()))) in 3
          quietly replace graduated = max(0, ///
              floor((matriculation_total - graduated[1] - graduated[2] - ///
                  graduated[3]) * (0.1 + 0.1 * runiform()))) in 4
          append using `tmpfil0'
          quietly save `tmpfil0', replace
      }
      
      generate double graduation_rate = graduated
      quietly bysort matriculation_yr (attendance_yr): replace graduation_rate = ///
          sum(graduation_rate)
      quietly replace graduation_rate = graduation_rate / matriculation_total
      
      *
      * Dataset should look like this
      *
      list in 1/20, noobs abbreviate(20) sepby(matriculation_yr)
      
      graph bar (asis) graduation_rate, over(attendance_yr) over(matriculation_yr) ///
          bar(1, fcolor(white) lcolor(black)) ///
          ylabel( , angle(horizontal) nogrid) ytitle(Cumulative Graduation Rate)
      
      exit

      Comment


      • #4
        For reasons why a question is unanswered, see the Advice in the FAQ, section 17. Not being very clear is, in my guess, a major reason in this case. Wording such as "didn't seem to work" never helps. What graph exactly did you get? Why isn't it what you want? If readers can't understand, most will shrug their shoulders and move on.

        See also the advice: If you can, reproduce the error with one of Stata's provided datasets, a small fragment of your dataset, or a simple concocted dataset that you include in your posting.

        graph bar by default shows means. You are averaging EventGrad. But the effect of including more observations is just to take means over more observations, not to convert it into a command that produces sums.

        catplot (SSC) may be closer to what you want.

        The best thing you can do is give a worked example with a fragment of your data, or explain an analogue of your problem using e.g. Stata's auto data.

        Comment


        • #5
          Ok well here is more detail - the first part is not to do with the graph, but just as a check (note: I renamed the variable from 'EventGrad' to 'Graduation', and I haven't expanded the dataset as in my first example, as I realise for what I want to do, I do not need to expand my dataset. Hence the key variable here is 'Duration' and not '_period'

          Firstly I look at the number of people who graduated from 2007 cohort:


          Code:
          tab Graduation if Year==2007
          
           Graduation |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |        428       25.45       25.45
                    1 |      1,254       74.55      100.00
          ------------+-----------------------------------
                Total |      1,682      100.00
          I then proceed to look at the number of students who started in 2007, and graduated within 3 and 4 years (Remember, I'm looking at a four-year graduation rate, so I'd include those who graduated within 3 years)

          Code:
          tab Graduation if Duration==3 & Year==2007
          
           Graduation |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         72       25.17       25.17
                    1 |        214       74.83      100.00
          ------------+-----------------------------------
                Total |        286      100.00
          
          tab Graduation if Duration==4 & Year==2007
          
           Graduation |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         43        6.65        6.65
                    1 |        604       93.35      100.00
          ------------+-----------------------------------
                Total |        647      100.00
          So basically, I'm looking at a graduation rate of 818/1682 = 48.63%

          I then run the following command - I think Nick has identified my error in that it is giving a mean, rather than a sum (note this includes or years, not just 2007, but I was using 2007 as an example)

          Code:
          graph bar Graduation if Duration==3|Duration==4, over(Year) asyvars blabel(bar, format(%9.2f)) percent
          which produces the following graph:



          My dataset looks like the following (I haven't expanded it yet - will do that later).

          Attached Files

          Comment


          • #6
            I'm not quite sure if I understood you correctly, but perhaps you could try something like:
            Code:
            bys Year: gen total=_N
            egen graduated=sum(Graduation) if duration==3 | duration==4, by(Year)
            collapse total graduated, by(Year)
            gen grad_rate= graduated*100/ total
            graph bar (asis) grad_rate, over(Year) asyvars blabel(bar, format(%9.2f))
            Note that the collapse command changes completely your data set, so you may want to save before using it.
            Hope this helps.
            Carlos

            Comment

            Working...
            X