Code for generating variables

Jordi Imbrechts

Join Date: Apr 2022
Posts: 44

Code for generating variables

29 Apr 2022, 06:55

Dear all,

As a masterstudent I'm currently (almost) finishing my thesis. Although I still have a couple of questions related to generating variables in Stata (using Stata 16).

Information about my data:
- I have unbalanced panel data, period 2011-2020. Around 185 000 observations (Belgian firms).
- My dependent variable is a binary variable: = 1 when there is a decrease => 10% of the workforce.
- My independent variables are also binary. I am currently struggling with generating the right code for these.

Example of my data: (count50)

Code:

input long ID int Jaar str71 Naam int Leeftijd byte Sector long Aantalwn int Afdankingen float(Productiviteit ROA Assetturnover ROE Bedrijfsresultaatperwn)
1 2011 "TOYOTA MOTOR EUROPE"              21 7 1890  8   9468913  .017703231  2.580798  -.01211103  64952.91
1 2012 "TOYOTA MOTOR EUROPE"              22 7 1985  8   8832648   .02846691  2.517578  .007375848  99873.05
1 2013 "TOYOTA MOTOR EUROPE"              23 7 2025 15   9473724   .01965199 2.7087715 -.007936415  68731.36
1 2014 "TOYOTA MOTOR EUROPE"              24 7 2036 20   9665455  .019871514 2.5071895 -.005685251  76606.58
1 2015 "TOYOTA MOTOR EUROPE"              25 7 2065 24   9582943   .02103217  2.673541 -.002472765  75386.92
1 2016 "TOYOTA MOTOR EUROPE"              26 7 2054 18  11515762   .02446286  2.753162  .006466216 102321.81
1 2017 "TOYOTA MOTOR EUROPE"              27 7 2128 15  11933369  .026412234  3.074022  .003182778 102532.42
1 2018 "TOYOTA MOTOR EUROPE"              28 7 2250 20  11528431  .028746504  2.679638  .025397966 123674.22
1 2019 "TOYOTA MOTOR EUROPE"              29 7 2525 16  11112818   .02707623 2.7859256  .020413205 108004.75
1 2020 "TOYOTA MOTOR EUROPE"              30 7 2571 12   9719027  .025036834 2.1078887   .01099639 115439.52
2 2011 "PFIZER SERVICE COMPANY"            9 7  167  1  158699.6   .05582782  .6294447   .07067245 14075.665
2 2012 "PFIZER SERVICE COMPANY"           10 7  308 10         0   .08712924         0   .12804179  49803.37
2 2013 "PFIZER SERVICE COMPANY"           11 7  276 20         0    .0902117         0    .1361892  52598.84
2 2014 "PFIZER SERVICE COMPANY"           12 7  275 14         0   .08335922         0    .1240549  52297.27
2 2015 "PFIZER SERVICE COMPANY"           13 7  268  3         0   .07175304         0   .12485207  57116.21
2 2016 "PFIZER SERVICE COMPANY"           14 7  236 12  38828412 .0006926063  .7733278   .10714502  34775.42
2 2017 "PFIZER SERVICE COMPANY"           15 7  231  7  51316864 .0010293572 1.1702504   .14413846  45138.53
2 2018 "PFIZER SERVICE COMPANY"           16 7  219 10  56523216 .0009795176 1.2496177   .11487263  44305.94
2 2019 "PFIZER SERVICE COMPANY"           17 7  224  4 101188960 .0012967803  .7962413    .3005882  164799.1
2 2020 "PFIZER SERVICE COMPANY"           18 7  237  1 101186104 .0010189478 1.1919173   .15437415  86502.11
3 2011 "JANSSEN PHARMACEUTICA"            77 3 3677 43  634812.6    .2335892  .3515235   .04423508  421836.3
3 2012 "JANSSEN PHARMACEUTICA"            78 3 3821 33  763892.2   .25126007  .4110364    .0746601 466955.25
3 2013 "JANSSEN PHARMACEUTICA"            79 3 3785 34  875231.2   .26847038  .4111074   .10211326 571562.75
3 2014 "JANSSEN PHARMACEUTICA"            80 3 3893 27  962520.2    .2483825  .4318945   .07041683  553545.3
3 2015 "JANSSEN PHARMACEUTICA"            81 3 4107 27  993217.7    .1690204  .3288512   .10037537 510486.25
3 2016 "JANSSEN PHARMACEUTICA"            82 3 4638 35 1248302.9     .276376  .4183958   .28478423  824580.4
3 2017 "JANSSEN PHARMACEUTICA"            83 3 4598 60 2232884.5    .1948978  .7311364   .10594828  595216.2
3 2018 "JANSSEN PHARMACEUTICA"            84 3 4594 26 2598562.5    .2254114  .8562906   .14229459  684050.1
3 2019 "JANSSEN PHARMACEUTICA"            85 3 4644 28   3146280   .23141466  .9314913     .170169  781644.9
3 2020 "JANSSEN PHARMACEUTICA"            86 3 4780 26 3596431.5   .23680106 1.0004208   .16852586  851280.6
4 2011 "EXXONMOBIL PETROLEUM & CHEMICAL"  34 3 2165  5  14695388   .01654665 1.0425363    .0405766 233238.34
4 2012 "EXXONMOBIL PETROLEUM & CHEMICAL"  35 3 2176  9  15152519  .021001816 1.0488497   .05068856    303409
4 2013 "EXXONMOBIL PETROLEUM & CHEMICAL"  36 3 2170  2  14052597  .018379996 1.0666726   .06182827  242142.4
4 2014 "EXXONMOBIL PETROLEUM & CHEMICAL"  37 3 2175  6  13759912  .012923716   .984891   .04221074 180557.23
4 2015 "EXXONMOBIL PETROLEUM & CHEMICAL"  38 3 2205  8  10691972  .016416688 .44692335    .0766432  392744.7
4 2016 "EXXONMOBIL PETROLEUM & CHEMICAL"  39 3 2197  8   9175407  .015552402  .3692766   .07220461  386430.1
4 2017 "EXXONMOBIL PETROLEUM & CHEMICAL"  40 3 2187  9  10887637    .0142755  .4387161   .06504572  354275.7
4 2018 "EXXONMOBIL PETROLEUM & CHEMICAL"  41 3 2214  6  12088139  .007806151  .5007361   .03043898 188446.25
4 2019 "EXXONMOBIL PETROLEUM & CHEMICAL"  42 3 2304  6   9358546  .005687178   .411369 -.005528804 129381.95
4 2020 "EXXONMOBIL PETROLEUM & CHEMICAL"  43 3 2238  6   5614647 .0026105626   .244941 -.026500564  59840.48
5 2011 "ELECTRABEL"                      106 4 5801 17   2506160  .029713834  .3143761    .0596582 236874.33
5 2012 "ELECTRABEL"                      107 4 5628 12   2446999  .015836613 .24370015  .029047446  159015.8
5 2013 "ELECTRABEL"                      108 4 5390 13 2313214.8  .012486763  .2392646  .019821707 120722.27
5 2014 "ELECTRABEL"                      109 4 5177  9 2397515.3   .01081251  .2395058  .016956229 108236.05
5 2015 "ELECTRABEL"                      110 4 4851  9 2179901.3 .0013057678  .2098209 -.004433884  13566.07
5 2016 "ELECTRABEL"                      111 4 4754  9 2510287.5   .00748435  .2291251  .007621482  81998.32
5 2017 "ELECTRABEL"                      112 4 4659 12 2437122.5 .0040441956  .2171694 -.002519652  45384.85
5 2018 "ELECTRABEL"                      113 4 4653 11 2380639.3  -.00935939 .22802757  -.04235218  -97713.3
5 2019 "ELECTRABEL"                      114 4 4690 11 2267758.8 .0024504496 .20909855 -.011275722  26576.12
5 2020 "ELECTRABEL"                      115 4 4655  5 2345807.8   .01092992 .26049516  .004661544  98425.99

Generating dependent variable:
My dependent variable is called "Collectieve ontslagen" (in Dutch). Explanation: I have to create a new dummy variable that indicates wether a firm has dismissed >= 10% of the workforce. Although there are some points of attention:
- The variable should be created with 2 existing variables: "Aantalwn" = total employees in the firm in that year & "Afdankingen" = the amount of dismissals in that year.
- The "dismissals" (Afdankingen) (year "n") have to be divided by "total employees" (Aantalwn) of the year BEFORE the dismissals occur (year "n-1"). So the first observation of each firm should always be a missing value.
- It is important that Stata recognises when the data switches to another firm. "ID" indicates this. Not every firm has an observation for the period 2011-2020. Some firms were only created in 2018 and therefor only have 3 years of observations.

Currently I was using this code:

Code:

bysort ID (Year): generate Collectiefontslag= Afdankingen[_n+1]/ Aantalwn>=0.10 if !missing( Totaalaantalwn[_n], Afdankingen[_n+1]) & Year[_n+1]==Year+1

But this code always gives me a missing value for the last observation and not for the first. So it should be the other way around.

Generating independent variables:
I have two independent variables, both binary. Due to the fact that both rely on the same idea, I'm only going to explain one of them to keep this as simple as possible.

I am investiging wether or not a decrease in productivity has an impact on the fact that a firm dismisses >= 10% of the workforce.
Productivity (dutch: productiviteit) is a variable I created myself by doing:

Code:

Omzet/Aantalwn

(English: revenue/total employees)

I have to measure this variable in 2 ways:
1) If there is a decrease in productivity (compared to the year before), dummy should be "1" in that specific year. No decrease: "0". Same remark as above: Stata should recognise when the data switches to another firm & the first observation should be a missing value (when productivity n+1 - productivity n => 0 => increase, so value 0. When productivity n+1 - productivity n < 0 => decrease in productivity, so value 1).
This dummy should be called something like: "Dummy decrease productivity"

2) Instead of just looking at a decrease in productivity, I also have to look at the median of productivity. The median of productivity first has to be calculated (of the whole sample) & a dummy has to be generated that indicates if the specific firm is BELOW THE MEDIAN in each specific year (=1) or above the median (=0).
This dummy should be called something like: "Dummy median productivity"
I tried the following code myself, but I am not sure if it is correct:

Code:

egen medianProductiviteit= median( Productiviteit)
gen below_medianPROD= Productiviteit < medianProductiviteit

I hope this should give enough context about my data and the variables that should be generated. If there are any questions about dutch words that should be translated, just let me know! I am really bad at generating these codes myself so I really hope someone here could help me out with this. Thanks in advance!

Kind regards,
Jordi

Last edited by Jordi Imbrechts; 29 Apr 2022, 07:09.

Tags: None

Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#2

29 Apr 2022, 06:55

Using Windows 11.
Comment

Announcement

Code for generating variables

Comment