Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compute averages of subsets

    Hi all,

    I have a dataset as follows:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Level_1 Level_2 Level_3 ln_price)
    1 1  43  -3.179277
    1 1  43 -3.1169856
    1 1  43  -3.128604
    1 1  43  -3.523117
    1 1  43  -3.531426
    1 1  67 -3.4845834
    1 1  67  -3.483262
    1 1  67  -3.500141
    1 1  67  -3.519534
    1 1  90  -3.541179
    1 1  90   -3.57089
    1 1  90  -3.580673
    1 1  90 -3.5837634
    1 1  90  -3.582916
    1 2 120 -2.8664606
    1 2 120   -2.85482
    1 2 120  -2.918243
    1 2 120  -2.880425
    1 2 120  -2.912734
    1 2 120  -2.931389
    1 2 120  -2.834747
    1 2 120  -2.843743
    1 2 120  -2.851529
    1 2 120  -2.933002
    1 2 120 -2.8192906
    1 2 120   -2.83181
    1 2 120  -2.829984
    1 2 120  -2.808753
    1 2 120   -2.82109
    1 2 120  -2.799774
    1 2 120   -2.77962
    1 2 120 -2.7576454
    1 2 120 -2.7634845
    1 2 120 -2.7659855
    1 2 120 -2.8187714
    1 2 120  -2.776918
    1 2 120  -2.785758
    1 2 120 -2.7580175
    1 2 120   -3.17905
    1 2 120   -3.17935
    1 2 120  -3.135324
    1 2 120 -3.1739445
    1 2 120 -3.1793156
    1 2 120  -3.126597
    1 2 120 -2.9974594
    1 2 120  -2.861044
    1 2 120 -2.3699684
    1 2 120 -2.2812521
    1 2 120  -2.941632
    1 2 120  -2.838551
    1 2 120  -2.535694
    1 2 120  -2.874293
    1 2 120 -2.1376004
    1 2 120 -2.2911053
    1 4 131  -2.499207
    1 4 131  -2.464331
    1 4 131  -2.474346
    1 4 131  -2.506128
    1 4 131 -2.5240934
    1 4 131 -2.5362494
    1 4 131   -2.56592
    1 4 131 -2.5699744
    1 4 131  -2.570757
    1 4 156  -2.570462
    1 4 156  -2.570074
    1 4 156  -2.575264
    1 4 156  -2.583664
    1 4 193   -2.59251
    1 4 193 -2.5766606
    1 4 193  -2.586446
    1 4 193  -2.587837
    1 4 193  -2.599673
    end
    Say Level_3 is product and Level_1 and Level_2 are two groupings (being Level_1 the Country of the product). What I would like to obtain from here is the average ln_price of products different from i-th product which belong to the same Leevel_2 and Level_1 of product i. So, to give an example, focusing on Level_1 = 1 and Level_2 = 1, I would like to end up with (see desired variable which I did manually):

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(Level_1 Level_2 Level_3 ln_price desired)
    1 1  43  -3.179277.    -3.1513732
    1 1  43 -3.1169856.   -3.1513732
    1 1  43  -3.128604.    -3.1513732
    1 1  43  -3.523117.    -3.1513732
    1 1  43  -3.531426.    -3.1513732
    1 1  67 -3.4845834.   -3.43387
    1 1  67  -3.483262.    -3.43387
    1 1  67  -3.500141.    -3.43387
    1 1  67  -3.519534.    -3.43387
    1 1  90  -3.541179.    -3.385204
    1 1  90   -3.57089.     -3.385204
    1 1  90  -3.580673.    -3.385204
    1 1  90 -3.5837634.   -3.385204
    1 1  90  -3.582916.    -3.385204
    end
    where -3.1513732 = (-3.483262-3.500141-3.519534-3.541179 -3.57089-3.580673-3.5837634-3.582916)/9 (i.e. the average ln_price of all the products inside Level_1=1 and Level_2 =1 and different from product 43), -3.433875 = (-3.179277 -3.1169 -3.128604 -3.523117 -3.531426 -3.541179 -3.57089 -3.580673 -3.5837634 -3.582916)/10 and -3.385204= (-3.179277 -3.1169 -3.128604 -3.523117 -3.531426-3.4845834-3.483262-3.500141-3.519534)/9.

    Can you please help figure out a way to do it recursively?





  • #2
    I don't understand this is an exercise in recursion. The recipe I use is just

    mean for other groups = (total for all groups MINUS total for this group) / (count for all groups MINUS count for this group)


    Code:
    egen TOTAL = total(ln_price), by(Level_1 Level_2)
    egen total = total(ln_price), by(Level_1 Level_2 Level_3)
    egen COUNT = count(ln_price), by(Level_1 Level_2)
    egen count = count(ln_price), by(Level_1 Level_2 Level_3)
    
    gen wanted = (TOTAL - total) / (COUNT - count)
    
    egen tag = tag(Level_*)
    
    list Level_* wanted if tag, noobs sep(0)
    
     +-----------------------------------------+
      | Level_1   Level_2   Level_3      wanted |
      |-----------------------------------------|
      |       1         1        43   -3.538549 |
      |       1         1        67   -3.433883 |
      |       1         1        90   -3.385215 |
      |       1         2       120           . |
      |       1         4       131    -2.58251 |
      |       1         4       156   -2.546724 |
      |       1         4       193   -2.539267 |
      +-----------------------------------------+

    In your worked example, you are dividing the sum of 8 values by 9. Other small discrepancies I imagine to arise because of precision issues.

    Comment

    Working...
    X