Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new variable which estimates the mean of a group of identifiers

    Dear all,

    I am currently working with a dataset which contains the following variables: id (industry identifier), year, LABSH(labour share of total output), CAPSH(capital share of total output). Only one country is analysed, so the id variable identifies the different industries within the country. An example below:


    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id int year float(LABSH CAPSH)
    1 1970 .6649457 .33505425
    2 1970 .8320854 .1679146
    3 1970 .883529 .11647099
    4 1970 .7319602 .26803976
    5 1970 .5540994 .44590065
    6 1970 .56377 .43623
    7 1970 .5137289 .4862711
    8 1970 .7054214 .29457858
    9 1970 .719725 .28027505
    10 1970 .8148143 .18518573
    11 1970 .547351 .452649
    12 1970 .8217311 .17826892
    13 1970 .7935457 .20645434
    14 1970 .8322504 .16774955
    15 1970 .8353265 .16467354
    16 1970 .6881071 .3118929
    17 1970 .8362749 .16372514
    18 1970 .6861746 .3138255
    19 1970 .6176726 .3823274
    20 1970 .03929371 .9607063
    21 1970 .6221473 .3778527
    22 1970 .6551881 .3448119
    23 1970 .4139394 .5860606
    24 1970 .7277795 .2722205
    25 1970 .28088248 .7191175
    26 1970 .8978723 .1021277
    27 1970 .7667561 .23324393
    28 1970 .7906137 .20938635
    29 1970 .6575865 .3424135
    30 1970 .6504946 .3495053
    31 1970 .317168 .682832
    32 1970 .23677467 .7632253
    33 1970 .7566398 .24336024
    34 1970 .7530184 .24698158
    35 1970 .7793154 .22068456
    36 1970 .7456722 .25432774
    37 1970 .707044 .292956
    38 1970 1 0
    39 1970 . .
    40 1970 .6433307 .3566693
    1 1971 .6551088 .3448912
    2 1971 .8298256 .17017433
    3 1971 .8506684 .14933157
    4 1971 .7580634 .2419366
    5 1971 .5831652 .4168348
    6 1971 .5422989 .4577011
    7 1971 .4926429 .5073571
    8 1971 .6536347 .3463653
    9 1971 .6888313 .3111687
    10 1971 .797429 .20257105
    11 1971 .55436313 .4456368
    12 1971 .7926531 .2073469
    13 1971 .7290537 .27094632
    14 1971 .843337 .15666297
    15 1971 .846294 .153706
    16 1971 .6781476 .32185245
    17 1971 .8504494 .14955057
    18 1971 .7019526 .29804745
    19 1971 .6258004 .3741996
    20 1971 .04327277 .9567272
    21 1971 .6251914 .3748086
    22 1971 .6310546 .36894545
    23 1971 .4701412 .52985877
    24 1971 .7109644 .28903553
    25 1971 .28638515 .7136149
    26 1971 .9259716 .0740284
    27 1971 .769419 .230581
    28 1971 .8131477 .18685228
    29 1971 .6702957 .3297043
    30 1971 .6490846 .3509154
    31 1971 .3177648 .6822352
    32 1971 .2364985 .7635015
    33 1971 .7299294 .27007055
    34 1971 .7410805 .25891954
    35 1971 .7751924 .2248076
    36 1971 .7362803 .26371977
    37 1971 .7044449 .29555508
    38 1971 1 0
    39 1971 . .
    40 1971 .6386511 .3613489
    1 1972 .6996611 .30033895
    2 1972 .8293197 .17068033
    3 1972 .842303 .157697
    4 1972 .7473305 .25266945
    5 1972 .56066453 .4393355
    6 1972 .53954387 .4604561
    7 1972 .4857557 .51424426
    8 1972 .6646692 .33533075
    9 1972 .6807057 .3192943
    10 1972 .7881901 .21180987
    11 1972 .54051524 .4594848
    12 1972 .7903811 .20961893
    13 1972 .7504753 .2495247
    14 1972 .8569328 .1430672
    15 1972 .8515332 .14846681
    16 1972 .6595253 .3404747
    17 1972 .8586392 .1413608
    18 1972 .7091877 .29081225
    19 1972 .6304913 .3695087
    20 1972 .04974724 .9502528
    end
    [/CODE]

    What I would like to do is generate the average aggregate labour share and capital share for a group of industries together. For example, I want to estimate the average labour share for a group which contains the industries 27,28,29,30, and another group which contains the industries 23 and 25 only. I am aware of the mean function to estimate averages, but I do not fully understand how I could apply it in this situation. Any help on this would be greatly appreciated.

    I thank you in advance.

    Best,

    Satya

  • #2
    You just need to create a new variable that identifies the groups first.
    Code:
    gen group = 1 if inlist(id, 27, 28, 29, 30)
    replace group = 2 if inlist(id, 23, 5)
    
    by group year, sort: egen wanted = mean(LABSH) if !missing(group)
    I assumed that you meant to create a variable that is the mean for each group of industries separately in each year. If that's not what you wanted, just take the reference to year out of the -by group...- command.

    Comment


    • #3
      Dear Clyde,

      Thank you for your reply. As always, the code works neatly.

      Best,

      Satya

      Comment

      Working...
      X