Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quickly calculate proportions of sum of every categories to sum of the total

    Dear Stata users,

    Suppose I have one continuous variable and one categorical variable. I want to calculate sum of continuous variable for each category of the categorical variable, and then calculate proportions of them to sum of the continuous variable (i.e. the total). For example, sum of continuous variable for category A equals 2000, for category B equals 3000, and the total equals 5000, so I will get two proportions: 2000/5000=40% and 3000/5000=60%. Steps to do that is very easy, however, I have so many continuous variables that I want to know if there's some commands to use other than by calculating one by one or by using a loop.

    Code:
    sysuse auto
    egen price_foreign=sum(price), by(foreign)
    egen price_total=sum(price)
    gen price_proportion=price_foreign/price_total*100
    tabulate price_proportion

  • #2
    Chen:
    do you mean something along the following lines?
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
     
    . egen wanted=group(foreign)
    
    . bysort wanted: egen check=pc(price)
    
    . bysort wanted: tab check
    
    -> wanted = 1
    
          check |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       1.042228 |          1        1.92        1.92
       1.044761 |          1        1.92        3.85
       1.161303 |          1        1.92        5.77
       1.203106 |          1        1.92        7.69
       1.212607 |          1        1.92        9.62
        1.25251 |          1        1.92       11.54
       1.261694 |          1        1.92       13.46
       1.269928 |          1        1.92       15.38
       1.285762 |          1        1.92       17.31
       1.292729 |          1        1.92       19.23
       1.298113 |          1        1.92       21.15
       1.321232 |          1        1.92       23.08
       1.324082 |          1        1.92       25.00
       1.325982 |          1        1.92       26.92
       1.328515 |          1        1.92       28.85
       1.389953 |          1        1.92       30.77
       1.401037 |          1        1.92       32.69
       1.401354 |          1        1.92       34.62
       1.410221 |          1        1.92       36.54
       1.419405 |          1        1.92       38.46
       1.426373 |          1        1.92       40.38
       1.430173 |          1        1.92       42.31
       1.471659 |          1        1.92       44.23
       1.495728 |          1        1.92       46.15
       1.498895 |          1        1.92       48.08
       1.503962 |          1        1.92       50.00
        1.52518 |          1        1.92       51.92
       1.548615 |          1        1.92       53.85
       1.562549 |          1        1.92       55.77
       1.616387 |          1        1.92       57.69
       1.637922 |          1        1.92       59.62
       1.643306 |          1        1.92       61.54
       1.653756 |          1        1.92       63.46
       1.703477 |          1        1.92       65.38
       1.806718 |          1        1.92       67.31
       1.833003 |          1        1.92       69.23
        1.83617 |          1        1.92       71.15
       1.864039 |          1        1.92       73.08
       1.952395 |          1        1.92       75.00
       1.996098 |          1        1.92       76.92
       2.008449 |          1        1.92       78.85
       2.054053 |          1        1.92       80.77
       2.478734 |          1        1.92       82.69
       2.791307 |          1        1.92       84.62
       3.284394 |          1        1.92       86.54
       3.284711 |          1        1.92       88.46
       3.605518 |          1        1.92       90.38
       3.640987 |          1        1.92       92.31
        4.26455 |          1        1.92       94.23
       4.305087 |          1        1.92       96.15
       4.592008 |          1        1.92       98.08
       5.037274 |          1        1.92      100.00
    ------------+-----------------------------------
          Total |         52      100.00
    
    ----------------------------------------------------------------------------------------------------------------------------------------
    -> wanted = 2
    
          check |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       2.668318 |          1        4.55        4.55
       2.703915 |          1        4.55        9.09
       2.772972 |          1        4.55       13.64
       2.844165 |          1        4.55       18.18
       3.058457 |          1        4.55       22.73
       3.202979 |          1        4.55       27.27
       3.267053 |          1        4.55       31.82
       3.343941 |          1        4.55       36.36
       3.615899 |          1        4.55       40.91
       3.842293 |          1        4.55       45.45
       4.071535 |          1        4.55       50.00
       4.128489 |          1        4.55       54.55
       4.199683 |          1        4.55       59.09
        4.43462 |          1        4.55       63.64
       4.481607 |          1        4.55       68.18
       4.876729 |          1        4.55       72.73
       5.083189 |          1        4.55       77.27
       5.787289 |          1        4.55       81.82
       6.898614 |          1        4.55       86.36
       6.930651 |          1        4.55       90.91
       8.539616 |          1        4.55       95.45
       9.247987 |          1        4.55      100.00
    ------------+-----------------------------------
          Total |         22      100.00
    
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      There could be an egen function for this, but if so I don't know about it. The existing pc() function doesn't help much.

      In any case, it's arguable that practising a loop is better for your Stata growth. And you need a loop any way.

      Here's one approach. There are others.


      Code:
      sysuse auto, clear
      ds
      
      foreach v in price weight length {
          egen double psum = total(`v'), by(foreign)
          egen double sum = total(`v')
          gen double share_`v' = psum/sum
          drop psum sum
      }
      
      tabdisp foreign, c(share_*) format(%4.3f)
      
      ----------------------------------------------------
      Car       |
      origin    |  share_price  share_weight  share_length
      ----------+-----------------------------------------
       Domestic |        0.692         0.772         0.733
        Foreign |        0.308         0.228         0.267
      ----------------------------------------------------

      Comment


      • #4
        Dear Carlo Lazzaro, thank you for your reply. My problem can be solved perfectly by Nick Cox's answer in #3. Nick provided a solution using a loop which is exactly what I did in practice which needs 'two egen, one gen and a loop'. I searched for a ready-made command, and maybe there is none as yet.

        Comment


        • #5
          If you want such a command, you could write it yourself. That is not a put-down, as it's the way most community-contributed commands arose. (The others arose because someone wanted a command, and programming it looked like fun to someone else, not true here for me. The Stata world doesn't, I think, need many more egen functions.)

          Comment


          • #6
            Thank you very much Nick. I just worried about that the pre-conceived command will be seldom used. Although I searched for such a command, cases that it applied to maybe rare. In any case, I will try to write one.

            Comment

            Working...
            X