Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Share of a string variable

    Dear Profs and Colleagues,

    I am going to generate the share of each nationality (nacio) by firm (i )and year (t).
    firm id= NPC_FIC ,year=2010-2019.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(year NPC_FIC) str6 nacio
    2010 500000001 "PT"
    2010 500000001 "PT"
    2010 500000001 "PT"
    2010 500000002 "PT"
    2010 500000002 "PT"
    2010 500000002 "PT"
    2010 500000002 "PT"
    2011 500000002 "AF"
    2011 500000002 "PT"
    2011 500000002 "PT"
    2011 500000002 "PT"
    2012 500000002 "PT"
    2012 500000002 "PT"
    2012 500000002 "PT"
    2012 500000002 "AF"
    2013 500000002 "PT"
    2013 500000002 "PT"
    2014 500000002 "PT"
    2015 500000002 "PT"
    2015 500000002 "PT"
    2016 500000002 "PT"
    2016 500000002 "PT"
    2017 500000002 "PT"
    2017 500000002 "PT"
    2010 500000033 "PT"
    2010 500000033 "PT"
    2011 500000033 "PT"
    2011 500000033 "ES"
    2012 500000033 "PT"
    2012 500000033 "PT"
    2013 500000033 "ES"
    2014 500000033 "PT"
    2015 500000033 "PT"
    2015 500000033 "PT"
    2016 500000033 "PT"
    2017 500000033 "PT"
    2018 500000033 "PT"
    2019 500000033 "PT"
    2010 500000050 "ES"
    2010 500000050 "ES"
    2011 500000050 "ES"
    2012 500000050 "PT"
    2013 500000050 "PT"
    2014 500000050 "PT"
    2014 500000050 "PT"
    2015 500000050 "PT"
    2015 500000050 "PT"
    2019 500000073 "PT"
    2010 500000083 "AF"
    2011 500000083 "AF"
    2012 500000083 "AF"
    2013 500000083 "AF"
    2014 500000083 "AF"
    2015 500000083 "AF"
    2016 500000083 "AF"
    2017 500000083 "PT"
    2018 500000083 "PT"
    2018 500000083 "PT"
    2019 500000083 "PT"
    2019 500000083 "PT"
    2015 500000101 "PT"
    2016 500000101 "AF"
    2017 500000101 "AF"
    2018 500000101 "PT"
    2019 500000101 "GB"
    2010 500000104 "UA"
    2011 500000106 "PT"
    2011 500000113 "PT"
    2012 500000113 "PT"
    2013 500000113 "PT"
    2014 500000113 "PT"
    2010 500000119 "PT"
    2010 500000119 "PT"
    2010 500000119 "PT"
    2010 500000119 "PT"
    2011 500000119 "UA"
    end
    For example, I need to reach that the share of UA is 0.05, the share of ES is 0.09 so on and so forth.
    In one year a firm id might appear several times because nacio is designated to the nationality of individuals so its normal that one firm may have several workers with different or the same nationality.
    any ideas appreciated
    Cheers,
    Paris

  • #2
    I assume that NPC_FIC indicates firm.
    Code:
    by NPC_FIC year nacio, sort: gen this_nationality = _N if _n == 1
    by NPC_FIC year (nacio): gen everyone = _N
    gen proportion_this_nationality = this_nationality/everyone
    These results are given as proportions, ranging from 0 to 1. If you prefer percentages, just multiply the result by 100.

    Comment


    • #3
      Thank you Prof. Clyde. You are right NPC_FIC indicates firm' ids.
      But I need to generate for each nationality one unique variable. Because I gonna use those variables as control variables in my regression. I need to have several variables ( equal to the numbers of nacio that exist in the viable nacio i.g AU, ES etc).

      Comment


      • #4
        Code:
        by NPC_FIC year nacio, sort: gen this_nationality = _N
        by NPC_FIC year (nacio): gen everyone = _N
        gen proportion = this_nationality/everyone
        collapse (first) proportion, by(NPC_FIC year nacio)
        
        reshape wide proportion, i(NPC_FIC year) j(nacio) string
        mvencode proportion*, mv(0)

        Comment


        • #5
          It worked perfectly! Appreciated.
          I wonder if there are any alternatives without making *collapse* to generate these proportions? with *collapse* I lost other variables that I need for the regression.

          Comment


          • #6
            Yes, there are alternatives. But which alternative to use depends on whether those other variables you need are constant within each combination of NPC_FIC and year, or vary within such combinations.

            If those variables are constant within each combination of NPC_FIC and year, you can just change the collapse command to include them, and otherwise leave the code the same.
            Code:
            collapse (first) proportion list_the_other_variables_here, by(NPC_FIC year nacio)
            If those variables can vary within combinations of NPC_FIC and year, then you have to do something a little more complicated. I created a random other variable to illustrate the approach:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear*
            input double(year NPC_FIC) str6 nacio float other_variable
            2010 500000001 "PT"   .3488717
            2010 500000001 "PT"   .2668857
            2010 500000001 "PT"   .1366463
            2010 500000002 "PT" .028556867
            2010 500000002 "PT"   .8689333
            2010 500000002 "PT"   .3508549
            2010 500000002 "PT"  .07110509
            2011 500000002 "AF"  .32336795
            2011 500000002 "PT"   .5551032
            2011 500000002 "PT"    .875991
            2011 500000002 "PT"  .20470947
            2012 500000002 "PT"   .8927587
            2012 500000002 "PT"   .5844658
            2012 500000002 "PT"   .3697791
            2012 500000002 "AF"   .8506309
            2013 500000002 "PT"   .3913819
            2013 500000002 "PT"  .11966132
            2014 500000002 "PT"   .7542434
            2015 500000002 "PT"   .6950234
            2015 500000002 "PT"   .6866152
            2016 500000002 "PT"   .9319346
            2016 500000002 "PT"   .4548882
            2017 500000002 "PT"   .0674011
            2017 500000002 "PT"   .3379889
            2010 500000033 "PT"   .9748848
            2010 500000033 "PT"   .7264384
            2011 500000033 "PT"  .04541512
            2011 500000033 "ES"   .7459667
            2012 500000033 "PT"   .4961259
            2012 500000033 "PT"   .7167162
            2013 500000033 "ES"    .859742
            2014 500000033 "PT"  .13407555
            2015 500000033 "PT"  .48844185
            2015 500000033 "PT"   .8712187
            2016 500000033 "PT"   .7664683
            2017 500000033 "PT"  .25125554
            2018 500000033 "PT"  .16636477
            2019 500000033 "PT"   .7437958
            2010 500000050 "ES"   .9805113
            2010 500000050 "ES"   .7295772
            2011 500000050 "ES"   .9011049
            2012 500000050 "PT"  .26436493
            2013 500000050 "PT"   .8856509
            2014 500000050 "PT"    .882112
            2014 500000050 "PT"    .748933
            2015 500000050 "PT"   .9196262
            2015 500000050 "PT"   .6934533
            2019 500000073 "PT"   .2154026
            2010 500000083 "AF"   .8285888
            2011 500000083 "AF"  .04421536
            2012 500000083 "AF"   .8630378
            2013 500000083 "AF"   .3526046
            2014 500000083 "AF"   .7720399
            2015 500000083 "AF"   .5861199
            2016 500000083 "AF"   .3227766
            2017 500000083 "PT"  .17293066
            2018 500000083 "PT"   .8053644
            2018 500000083 "PT"   .3060019
            2019 500000083 "PT"  .21909967
            2019 500000083 "PT"    .724731
            2015 500000101 "PT"   .6964867
            2016 500000101 "AF"   .9119344
            2017 500000101 "AF"   .6795634
            2018 500000101 "PT"   .3549416
            2019 500000101 "GB"     .73897
            2010 500000104 "UA"  .18740167
            2011 500000106 "PT"   .3146128
            2011 500000113 "PT"   .1375693
            2012 500000113 "PT"   .6537739
            2013 500000113 "PT"  .27013195
            2014 500000113 "PT"   .8998394
            2010 500000119 "PT"   .5734232
            2010 500000119 "PT"  .11147037
            2010 500000119 "PT"   .4145227
            2010 500000119 "PT" .003052204
            2011 500000119 "UA"   .6659978
            end
            
            by NPC_FIC year nacio, sort: gen this_nationality = _N
            by NPC_FIC year (nacio): gen everyone = _N
            gen proportion = this_nationality/everyone
            
            frame put NPC_FIC year nacio proportion, into(working)
            frame working {
                by NPC_FIC year nacio, sort: keep if _n == 1
                reshape wide proportion, i(NPC_FIC year) j(nacio) string
                mvencode proportion*, mv(0)
            }
            
            frlink m:1 NPC_FIC year, frame(working)
            frget proportion*, from(working)
            drop working
            frame drop working

            Comment


            • #7
              Dear Prof Clyde, It is awesome.
              Thank you so much. I am always impressed by your STATA knowledge and I do not know how this much information is possible

              Comment

              Working...
              X