Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make use of egen total function briefly

    Dear Stata users,
    My question is about total() function of egen. When I caculate ratio of total A to total B, I should use egen total function twice. Is there any way to avoid this normal procedure?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte age long pop
      0 129244
      1 129031
      2 129677
      3 131844
      4 134233
      5 136257
      6 138150
      7 138177
      8 138052
      9 138230
     10 138593
     11 137820
     12 136015
     13 135181
     14 134080
     15 132227
     16 130176
     17 128334
     18 126876
     19 125581
     20 124454
     21 123764
     22 123594
     23 122466
     24 120914
     25 120375
     26 120134
     27 120039
     28 119797
     29 119478
     30 119531
     31 120137
     32 122771
     33 123947
     34 121942
     35 120806
     36 119443
     37 116626
     38 113716
     39 110961
     40 109775
     41 108046
     42 104884
     43 102155
     44  98928
     45  96236
     46  95056
     47  94399
     48  94159
     49  94360
     50  93960
     51  93335
     52  92909
     53  91469
     54  89414
     55  86034
     56  83016
     57  82439
     58  81545
     59  81147
     60  77985
     61  69666
     62  64326
     63  62817
     64  62087
     65  62270
     66  60297
     67  57748
     68  55728
     69  52611
     70  50107
     71  47085
     72  43855
     73  40987
     74  37399
     75  34444
     76  31505
     77  27617
     78  24507
     79  22632
     80  20922
     81  19402
     82  17868
     83  16106
     84  14328
     85  12723
     86  11133
     87   9577
     88   8080
     89   6691
     90   5514
     91   4476
     92   3575
     93   2781
     94   2098
     95   1543
     96   1103
     97    776
     98    530
     99    349
    100    550
    end
    Code:
    gen age_specific=age*pop
    egen total_age=total(age_specific)
    egen total_pop=total(pop)
    gen mean_age=total_age/total_pop
    Can I generate my aimed variable (mean_age in above) with one line of expression?

  • #2
    P.S. My question can be expressed as: Is there any way to achieve an expression of
    Code:
    (e)gen newvar = total(A)/total(B)

    Comment


    • #3
      Yes, the largely neglected -ratio- command does this:
      Code:
      ratio  A/B
      gen wanted = _b[_ratio_1]
      will do this. -ratio- is an estimation command, so you can access its results in e() and r(table) just as you would after any other Stata estimation command.

      And if you want to do this -by(groupvar)-, it has an -over()- option.

      That said, given that you need to extract the results from _b or e(b) or r(table) to make a variable out of the results, I'm not sure what the advantage is. I suppose in a very large data set the calculation might be noticeably faster than doing two trips through -egen, total()- and a division, but otherwise, ...

      I think the more practical use case for -ratio- is with -svy- estimation.

      Comment


      • #4
        Thank you Clyde Schechter. It is not what I want. There's no variable A and variable B in my data, I have to caculate total of A and total of B firstly. Thus I generate my wanted variable taking three steps at least. That is my question.

        Comment


        • #5
          Isn't this just a weighted mean of age?

          Comment


          • #6
            I don’t see what the issue in having to use egen twice is. It’s a compact convenience command. That said, as Nick alluded to, there’s a much more efficient way to compute weighted sums

            Code:
            mean age [fw=pop]
            This is an estimation command, but if your done need population estimates, just the weighted mean of age, then -summarize- can be used instead with the same syntax.

            Comment


            • #7
              Further, r(sum) and r(sum_w) are always returned results from summarize -- even with the so-called meanonly option.

              Comment


              • #8
                Thank you very much Nick Cox. Yes I can get what I want using:
                Code:
                sum age [aw=pop]
                I doubt whether if I can do this in every circumstance.

                Comment


                • #9
                  Thank you Leonardo Guizzetti. #6

                  Comment


                  • #10
                    Note that something like


                    Code:
                    bysort x : gen double wanted = sum(y) / sum(z) 
                    
                    by x: replace wanted = wanted[_N]
                    is an example of (a) cumulatively summing two variables at once (b) getting the ratio of the overall sums (c) doing it groupwise.

                    Comment


                    • #11
                      Re #4
                      There's no variable A and variable B in my data, I have to caculate total of A and total of B firstly. Thus I generate my wanted variable taking three steps at least. That is my question.
                      While I still think there is little advantage to using -ratio-, you are misunderstanding what -ratio- does. The command -ratio- calculates the ratio of the total of A to the total of B, so you don't have to explicitly calculate those totals first. For the example you gave in #1, the code would be:
                      Code:
                      gen age_specific=age*pop
                      ratio age_specific/pop
                      gen wanted = _b[_ratio_1]

                      Comment


                      • #12
                        Originally posted by Chen Samulsion View Post
                        Thank you very much Nick Cox. Yes I can get what I want using:
                        Code:
                        sum age [aw=pop]
                        I doubt whether if I can do this in every circumstance.
                        Note, you have written -aweight-s in your code block, whereas I specified -fweight-s. While you will get the same mean, these weights are distinctly different and will produce different standard errors. In this case, frequency weights are the correct type of weights to use because you have aggregate data of the population for each age.

                        Comment


                        • #13
                          Dear Nick Cox Clyde Schechter Leonardo Guizzetti thank you all. I have learn a lot about this.

                          Comment

                          Working...
                          X