Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting values of a variable as a proportion of total observations on the variable

    Dear colleagues and mentors, I have a STATA file in which each row is a person convicted of an offence and I’m trying to determine the proportion of people appearing in a particular court that have a court’s workload that involves a particular case. Let Oij be the number of cases of type i in court j and let Tj be the total number of cases finalised by court j over the same time period. I want to create a variable for each court which takes the value Oij/Tj*100. Is there an easy way to do this?


  • #2
    Code:
    assert !missing(cases)
    bysort court: gen percent=_N
    bys court type: replace percent= (_N/percent)*100
    For your future posts, review FAQ Advice #10 for details on how to use the dataex command to provide data examples. Also see https://www.statalist.org/forums/help#spelling on spelling Stata.

    Comment


    • #3
      The percentage of a type is 100 times the mean over a (0, 1) indicator, where 1 indicates the type and 0 indicates anything else. That can be calculated on the fly, even if it does not exist as a variable. In the auto dataset, foreign is such an indicator and price > 5000 is an expression that is true or false.

      Just watch out for missing values!

      .
      Code:
       
      . sysuse auto, clear
      (1978 automobile data)
      
      . egen pcforeign = mean(100 * foreign), by(rep78)
      
      . tabdisp rep78, c(pcforeign)
      
      ----------------------
      Repair    |
      record    |
      1978      |  pcforeign
      ----------+-----------
              1 |          0
              2 |          0
              3 |         10
              4 |         50
              5 |   81.81818
              . |         20
      ----------------------
      
      . tab rep78, missing su(foreign)
      
           Repair |        Summary of Car origin
      record 1978 |        Mean   Std. dev.       Freq.
      ------------+------------------------------------
                1 |           0           0           2
                2 |           0           0           8
                3 |          .1   .30512858          30
                4 |          .5   .51449576          18
                5 |   .81818182   .40451992          11
                . |          .2    .4472136           5
      ------------+------------------------------------
            Total |    .2972973   .46018846          74
      
      . tab rep78 foreign, row
      
      +----------------+
      | Key            |
      |----------------|
      |   frequency    |
      | row percentage |
      +----------------+
      
          Repair |
          record |      Car origin
            1978 |  Domestic    Foreign |     Total
      -----------+----------------------+----------
               1 |         2          0 |         2 
                 |    100.00       0.00 |    100.00 
      -----------+----------------------+----------
               2 |         8          0 |         8 
                 |    100.00       0.00 |    100.00 
      -----------+----------------------+----------
               3 |        27          3 |        30 
                 |     90.00      10.00 |    100.00 
      -----------+----------------------+----------
               4 |         9          9 |        18 
                 |     50.00      50.00 |    100.00 
      -----------+----------------------+----------
               5 |         2          9 |        11 
                 |     18.18      81.82 |    100.00 
      -----------+----------------------+----------
           Total |        48         21 |        69 
                 |     69.57      30.43 |    100.00 
      
      . egen hi_price = mean(100 * (price > 5000)), by(rep78)
      
      . tabdisp rep78, c(hi_price)
      
      ----------------------
      Repair    |
      record    |
      1978      |   hi_price
      ----------+-----------
              1 |          0
              2 |         50
              3 |   43.33333
              4 |   66.66666
              5 |   54.54546
              . |         40
      ----------------------

      Comment


      • #4
        Note: The FAQ Advice relating to the dataex command should be Advice #12 in post #2.

        Comment

        Working...
        X