Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building variables in terms of groups and categories

    Hi all,

    I have data in terms of country and industry per year. I build some categories for industries (in terms of technological intensity) and regions (groups of countries). I would like to get variables for plots in terms of the share of employment of people employed in each of these categories (region and technological intensity). How can do I it? So far, I could achieve the following: (Is there any ways I can get the shares of people employed by technological intensity in each region?)


    bysort country year: egen total_employment_country= sum(Employment) (total employment in countries)

    gen share_emp_country= Employment/total_employment_country (share of each employment of each industry i in each country j)


    bysort region year tech_intensity : egen total_employment_region= sum(Employment) (total number of people employed per region and tech intensity)

    bysort tech_intensity year: egen total_employment_tech=sum(Employment) (total number of people employed by tech intensity in a year)

    bysort country year tech_intensity: egen total_employment_cnt_tech= sum(Employment) (total number of people in country j by tech intensity)


    dataex country year isic isiccomb sourcecode Establishments Employment OutputINDSTAT4 Wages GrossFixed ValueAdde
    > d

    ----------------------- copy starting from the next line -----------------------
    Code:
     * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(country year) byte(isic isiccomb sourcecode) long(Establishments Employment) double(OutputINDSTAT4 Wages GrossFixed ValueAdded)
    558 1963 17 17 3 .       .   12528571          .         .    8000000
     56 1963 31  . 3 .  104900  781899976  193939658         .  325580007
    280 1963 28 28 3 .  586000 4563249083 1180499597         . 2424999315
    246 1963 36 36 3 .   14500   74999861   24375651         .   37500859
    392 1963 20 20 3 .  603000 2683333333  361111111  97222222  858333333
    558 1963 25 25 3 .       .    1300000          .         .     685714
    400 1963 37 37 0 .       .          .          .         .          .
    218 1963 20 20 3 .    1076    2388889     744444     77778    1333333
    218 1963 25 25 3 .    1139    6500000     755555         .    3277777
    598 1963 34  . 3 .    1358    2271360     925120         .    1352960
    894 1963 22 22 3 .     891    2801120    1330532         .    1764706
    372 1963 26 26 3 .    6900   41999832   11200102   6833849   24358403
    710 1963 30  . 3 .       .          .          .         .          .
     56 1963 19  . 3 .       .          .          .         .          .
    716 1963 28 28 3 .    5617          .          .         .          .
    508 1963 18  . 3 .       .    1947826     417391         .          .
    218 1963 18  . 3 .    1371    4222223     666666    144445    1388889
    566 1963 37 37 0 .       .          .          .         .          .
    388 1963 25 25 3 .     928    6285997    1185800         .    2253999
    300 1963 37 37 0 .       .          .          .         .          .
    590 1963 26 26 3 .    1136   10020000    2000000    649000    5880000
    508 1963 16 16 3 .       .    6539130     452174         .          .
    470 1963 32  . 3 .       .          .          .         .          .
    170 1963 22 22 3 .   11000   51666667   10777778   2333333   27222222
    710 1963 33 33 3 .    3000   16799993    4199998    686000    8399997
    356 1963 28 28 3 .  153000  456120456   57120057         .  121590122
    300 1963 34  . 3 .    8850   46001295    8995809   2362536   23330039
    288 1963 35  . 3 .       .          .          .         .          .
    642 1963 17 17 3 .  168000          .   99120000  26500000          .
    124 1963 35  . 3 .       .          .          .         .          .
    894 1963 21 21 3 .     100    1400560     126050         .     280112
    702 1963 23 23 3 .     447   63449213    1058075         .    9431534
    280 1963 20 20 3 .  158000 1212499902  276999738         .  839999941
    616 1963 26 26 3 .  199000          .          . 1.020e+08  2.270e+08
    300 1963 15 15 3 .   39440  381004305   30429005  15504140   85664634
    372 1963 20 20 3 .    3400   19599628    4480923   1371621    8119468
    266 1963 24 24 3 .      23      36732          .         .          .
    792 1963 24 24 3 .   13930  122222222   14222222   8333333   51111111
    788 1963 32  . 3 .       .          .          .         .          .
    170 1963 36 36 3 .    7000   25888889    5111111         .   15444444
     36 1963 26 26 3 .   45000  478240077  118720019  57120009  239680038
    372 1963 18  . 3 .   24800  106402075   27721301   2494057   46760813
    170 1963 20 20 3 .    5800   21222222    3777778   2111111    9777778
    792 1963 17 17 3 .   95070  466666667   63111111  21888889  162222222
    340 1963 24 24 3 .     933    4630000     946500         .    2200000
    792 1963 35  . 3 .       .          .          .         .          .
    616 1963 15 15 3 .  370000          .          .         .  1.013e+09
    470 1963 27 27 3 .       0          0          0         0          0
    702 1963 28 28 3 .    3100   18293360    2678671    718668    6533343
    710 1963 27 27 3 .   58000  457799817   92399963  38947984  207199917
     76 1963 26 26 3 .  139000          .          .         .          .
    356 1963 35  . 3 .       .          .          .         .          .
    890 1963 35  . 3 .       .          .          .         .          .
    608 1963 30  . 3 .       .          .          .         .          .
    368 1963 37 37 0 .       .          .          .         .          .
    152 1963 20 20 3 .    9950   49321825    6165228         0   24660912
    392 1963 16 16 3 .       .          .          . 177777778          .
    100 1963 22 22 3 .    6400          .    3216000    400000          .
    862 1963 33 33 3 .      10      28060       9851         .      14328
    792 1963 22 22 3 .    5010   27777778    6222222   2666667   15555556
    598 1963 30  . 3 .       .          .          .         .          .
    404 1963 18  . 3 .    1941    6817997     865200         .    1570799
    200 1963 19  . 3 .       .          .          .         .          .
    840 1963 35  . 3 .       .          .          .         .          .
    760 1963 21 21 3 .     200    1047120      54974         .     261780
    840 1963 31  . 3 . 1466000  2.924e+10  9.030e+09 7.000e+08  1.625e+10
     32 1963 35  . 3 .       .          .          .         .          .
    218 1963 35  . 3 .       .          .          .         .          .
    894 1963 17 17 3 .     236    1400560     140056         .     322129
    250 1963 17 17 3 .  494512 3644679466          . 344334020 1308876636
    250 1963 20 20 3 .  139121 1271405036          .         .  515285990
    792 1963 28 28 3 .   20000  122222222   18000000  20777778   45555556
    100 1963 29  . 3 .   78300          .   45022500         .          .
    356 1963 18  . 3 .   34000  119070119   10920011         .   18480018
    400 1963 36 36 3 .    1756    3891998     481600         .    1817199
    218 1963 31  . 3 .     115     500000      77778     38889     166667
    894 1963 34  . 3 .     292    1400560     350140         .     420168
    124 1963 22 22 3 .   75000  899385636  343991826  42651278  593409080
    348 1963 37 37 0 .       .          .          .         .          .
    392 1963 18  . 3 .  340000 1308333333  186111111  36111111  461111111
    470 1963 30  . 3 .       .          .          .         .          .
    368 1963 29  . 3 .     930    2687999    1061200         .    1427999
    250 1963 16 16 3 .   11636  708924467          .         .  625879227
    470 1963 25 25 3 .     425    2018800     310799         0     828801
    590 1963 35  . 3 .       .          .          .         .          .
     36 1963 33 33 3 .    7000   48160008   15680003   6720001   26880004
    840 1963 36 36 3 .  670000  1.057e+10  3.040e+09 2.000e+08  5.630e+09
    788 1963 16 16 3 .     802   27357143    1247619    261905    2371429
    364 1963 17 17 3 .   52343  184313663   20039604  15775578   87619010
    642 1963 15 15 3 .  129000          .   79618889         .          .
    368 1963 31  . 3 .     290    1427999     296800         .     336000
    760 1963 32  . 3 .       .          .          .         .          .
    218 1963 26 26 3 .    1192    7833334     950000    400000    4166667
    890 1963 23 23 3 .   10000          .    7066667         .   76933333
    222 1963 16 16 3 .     248    7200000     436000         .    3884000
    218 1963 30  . 3 .       .          .          .         .          .
    400 1963 15 15 3 .    5258   23948390    1397199         .    5370398
    760 1963 27 27 3 .    1380    1047120     206806         .     418848
    752 1963 32  . 3 .       .          .          .         .          .
    598 1963 27 27 3 .       0          0          0         .          0
    end



  • #2
    Two things:
    • I don't see region or tech_intensity in your data extract
    • the function sum() isn't doing what you think it's doing. You probably want the total() function from the egen command
    Otherwise, to get the shares, don't you just have to divide the appropriate variables? e.g.,

    Code:
    bysort region year tech_intensity : egen total_employment_region= total(Employment) (total number of people employed per region and tech intensity)
    gen share_employment_region = Employment/total_employment_region
    You could also look into the collapse, by() command.

    Comment


    • #3
      Sorry, I generated many variables. The dataex for region and tech_intensity is below. The only I could do is generating the share of employment for each industry in each country. Is there a way of calculating the shares by region and tech_intensity without using collapse (and loosing data points)?


      gen share_emp_country= Employment/total_employment_country


      Code:
       * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(region tech_intensity manufacturing_type share_emp_country)
      0 3 0  .0007760043
      0 1 1   .003450542
      0 3 0   .008659284
      0 3 0    .01443538
      0 3 0            .
      0 3 0            .
      0 1 1    .08242089
      0 3 0            .
      0 3 0            0
      0 3 0            .
      0 1 0   .010188768
      0 2 1  .0082690185
      0 1 1            .
      0 1 1            .
      0 1 0   .010408543
      0 3 0            .
      0 1 1     .0246242
      0 1 1            .
      0 1 1   .028444445
      0 3 0   .022379376
      0 3 0            .
      0 3 0            .
      0 3 0    .05310559
      0 3 0            .
      0 1 1            .
      0 3 0   .013087248
      0 1 1    .07800066
      0 3 0            .
      0 2 1   .013361142
      0 1 1   .005627478
      0 2 1    .01082511
      0 0 0            .
      0 1 1            .
      0 2 1   .027097147
      0 2 1   .007089278
      0 3 0            .
      0 2 1  .0037411505
      0 3 0 .00010231824
      0 3 0            .
      0 1 1    .07228665
      0 3 0            .
      0 1 1    .20255513
      0 1 1            .
      0 3 0   .011393432
      0 3 0   .063522615
      0 2 1   .011060948
      0 2 0    .05798636
      0 1 1            .
      0 1 0            .
      0 3 0   .034570627
      0 2 0    .07201777
      0 3 0    .03927765
      0 1 1            .
      0 2 0    .09110097
      0 1 0            .
      0 1 1   .017237872
      0 3 0    .02198046
      0 3 0   .026413195
      0 3 0            .
      0 3 0            .
      0 3 0   .015873877
      0 3 0            .
      0 3 0            .
      0 2 1            .
      0 2 0    .02791145
      0 2 1   .004978633
      0 1 1   .015232828
      0 2 0            .
      0 3 0     .0116734
      0 1 1   .016164383
      0 1 1    .00844184
      0 1 0    .04182225
      0 3 0            .
      0 3 0 .00026006837
      0 3 0            .
      0 3 0            .
      0 3 0            .
      0 3 0            .
      0 3 0            .
      0 1 1    .02540928
      0 2 0    .02188769
      0 0 0            .
      0 1 1  .0009282486
      0 1 1            .
      0 2 0            .
      0 3 0            .
      0 3 0            .
      0 1 1            .
      0 1 1            .
      0 1 1    .01876353
      0 2 1            .
      0 3 0   .030342324
      0 1 0   .008540695
      0 2 0   .014127466
      0 3 0   .006993696
      0 3 0            .
      0 3 0  .0013055317
      0 1 1    .07455633
      0 3 0   .020794146
      0 1 1            .
      end



      Comment


      • #4
        Originally posted by Ben Jarvis View Post
        Two things:
        • I don't see region or tech_intensity in your data extract
        • the function sum() isn't doing what you think it's doing. You probably want the total() function from the egen command
        Otherwise, to get the shares, don't you just have to divide the appropriate variables? e.g.,

        Code:
        bysort region year tech_intensity : egen total_employment_region= total(Employment) (total number of people employed per region and tech intensity)
        gen share_employment_region = Employment/total_employment_region
        You could also look into the collapse, by() command.
        I tried that already and I get the following plot for region==3 and tech_intensity==3 (twoway line share_employment_region year if tech_intensity==3&region==3) and sorting year before
        Attached Files

        Comment


        • #5
          #2 Within an egen call sum() and total() are exactly equivalent. The two names were used before and after Stata 9.

          The Stata function sum() yields running or cumulative sum and is distinct but it is not invoked by the code in #1.

          Comment


          • #6
            the egen commands you use don't change the unit of observation in the data, which are still at the country-year-industry level (as far as I can surmise). If you want to plot not country-year-industry observations, but instead values for region-year-industry observations, then a collapse might make more sense for what you're doing. After the collapse command, you would still have to apply egen to arrive at shares, but that's just one more line of code.



            That will result in wonky graphs like what you show (also, you need to be sure that your data are sorted properly when plotting)

            Comment


            • #7
              Ok, I will copy the dt file and collapse in the copy. That's something I was avoiding. Thank you very much!

              Comment


              • #8
                I stand corrected on sum() and total() when used with egen. I guess I always use total()with egen because sum() behaves so differently outside of that context. How my mind palace works.

                Comment


                • #9
                  Note that you can use frames to your advantage here. Rather than destroying the original data in the memory, you can copy it over to a new frame, and then collapse within that frame. Might save you some trouble if you are trying to sort things out interactively.

                  Comment


                  • #10
                    How do frames work? That sounds interesting...

                    Comment


                    • #11
                      https://www.stata.com/features/overv...ets-in-memory/

                      Comment


                      • #12
                        Thank you!

                        Comment

                        Working...
                        X