Dear all,
As a masterstudent I'm currently (almost) finishing my thesis. Although I still have a couple of questions related to generating variables in Stata (using Stata 16).
Information about my data:
- I have unbalanced panel data, period 2011-2020. Around 185 000 observations (Belgian firms).
- My dependent variable is a binary variable: = 1 when there is a decrease => 10% of the workforce.
- My independent variables are also binary. I am currently struggling with generating the right code for these.
Example of my data: (count50)
Generating dependent variable:
My dependent variable is called "Collectieve ontslagen" (in Dutch). Explanation: I have to create a new dummy variable that indicates wether a firm has dismissed >= 10% of the workforce. Although there are some points of attention:
- The variable should be created with 2 existing variables: "Aantalwn" = total employees in the firm in that year & "Afdankingen" = the amount of dismissals in that year.
- The "dismissals" (Afdankingen) (year "n") have to be divided by "total employees" (Aantalwn) of the year BEFORE the dismissals occur (year "n-1"). So the first observation of each firm should always be a missing value.
- It is important that Stata recognises when the data switches to another firm. "ID" indicates this. Not every firm has an observation for the period 2011-2020. Some firms were only created in 2018 and therefor only have 3 years of observations.
Currently I was using this code:
But this code always gives me a missing value for the last observation and not for the first. So it should be the other way around.
Generating independent variables:
I have two independent variables, both binary. Due to the fact that both rely on the same idea, I'm only going to explain one of them to keep this as simple as possible.
I am investiging wether or not a decrease in productivity has an impact on the fact that a firm dismisses >= 10% of the workforce.
Productivity (dutch: productiviteit) is a variable I created myself by doing:
(English: revenue/total employees)
I have to measure this variable in 2 ways:
1) If there is a decrease in productivity (compared to the year before), dummy should be "1" in that specific year. No decrease: "0". Same remark as above: Stata should recognise when the data switches to another firm & the first observation should be a missing value (when productivity n+1 - productivity n => 0 => increase, so value 0. When productivity n+1 - productivity n < 0 => decrease in productivity, so value 1).
This dummy should be called something like: "Dummy decrease productivity"
2) Instead of just looking at a decrease in productivity, I also have to look at the median of productivity. The median of productivity first has to be calculated (of the whole sample) & a dummy has to be generated that indicates if the specific firm is BELOW THE MEDIAN in each specific year (=1) or above the median (=0).
This dummy should be called something like: "Dummy median productivity"
I tried the following code myself, but I am not sure if it is correct:
I hope this should give enough context about my data and the variables that should be generated. If there are any questions about dutch words that should be translated, just let me know! I am really bad at generating these codes myself so I really hope someone here could help me out with this. Thanks in advance!
Kind regards,
Jordi
As a masterstudent I'm currently (almost) finishing my thesis. Although I still have a couple of questions related to generating variables in Stata (using Stata 16).
Information about my data:
- I have unbalanced panel data, period 2011-2020. Around 185 000 observations (Belgian firms).
- My dependent variable is a binary variable: = 1 when there is a decrease => 10% of the workforce.
- My independent variables are also binary. I am currently struggling with generating the right code for these.
Example of my data: (count50)
Code:
input long ID int Jaar str71 Naam int Leeftijd byte Sector long Aantalwn int Afdankingen float(Productiviteit ROA Assetturnover ROE Bedrijfsresultaatperwn) 1 2011 "TOYOTA MOTOR EUROPE" 21 7 1890 8 9468913 .017703231 2.580798 -.01211103 64952.91 1 2012 "TOYOTA MOTOR EUROPE" 22 7 1985 8 8832648 .02846691 2.517578 .007375848 99873.05 1 2013 "TOYOTA MOTOR EUROPE" 23 7 2025 15 9473724 .01965199 2.7087715 -.007936415 68731.36 1 2014 "TOYOTA MOTOR EUROPE" 24 7 2036 20 9665455 .019871514 2.5071895 -.005685251 76606.58 1 2015 "TOYOTA MOTOR EUROPE" 25 7 2065 24 9582943 .02103217 2.673541 -.002472765 75386.92 1 2016 "TOYOTA MOTOR EUROPE" 26 7 2054 18 11515762 .02446286 2.753162 .006466216 102321.81 1 2017 "TOYOTA MOTOR EUROPE" 27 7 2128 15 11933369 .026412234 3.074022 .003182778 102532.42 1 2018 "TOYOTA MOTOR EUROPE" 28 7 2250 20 11528431 .028746504 2.679638 .025397966 123674.22 1 2019 "TOYOTA MOTOR EUROPE" 29 7 2525 16 11112818 .02707623 2.7859256 .020413205 108004.75 1 2020 "TOYOTA MOTOR EUROPE" 30 7 2571 12 9719027 .025036834 2.1078887 .01099639 115439.52 2 2011 "PFIZER SERVICE COMPANY" 9 7 167 1 158699.6 .05582782 .6294447 .07067245 14075.665 2 2012 "PFIZER SERVICE COMPANY" 10 7 308 10 0 .08712924 0 .12804179 49803.37 2 2013 "PFIZER SERVICE COMPANY" 11 7 276 20 0 .0902117 0 .1361892 52598.84 2 2014 "PFIZER SERVICE COMPANY" 12 7 275 14 0 .08335922 0 .1240549 52297.27 2 2015 "PFIZER SERVICE COMPANY" 13 7 268 3 0 .07175304 0 .12485207 57116.21 2 2016 "PFIZER SERVICE COMPANY" 14 7 236 12 38828412 .0006926063 .7733278 .10714502 34775.42 2 2017 "PFIZER SERVICE COMPANY" 15 7 231 7 51316864 .0010293572 1.1702504 .14413846 45138.53 2 2018 "PFIZER SERVICE COMPANY" 16 7 219 10 56523216 .0009795176 1.2496177 .11487263 44305.94 2 2019 "PFIZER SERVICE COMPANY" 17 7 224 4 101188960 .0012967803 .7962413 .3005882 164799.1 2 2020 "PFIZER SERVICE COMPANY" 18 7 237 1 101186104 .0010189478 1.1919173 .15437415 86502.11 3 2011 "JANSSEN PHARMACEUTICA" 77 3 3677 43 634812.6 .2335892 .3515235 .04423508 421836.3 3 2012 "JANSSEN PHARMACEUTICA" 78 3 3821 33 763892.2 .25126007 .4110364 .0746601 466955.25 3 2013 "JANSSEN PHARMACEUTICA" 79 3 3785 34 875231.2 .26847038 .4111074 .10211326 571562.75 3 2014 "JANSSEN PHARMACEUTICA" 80 3 3893 27 962520.2 .2483825 .4318945 .07041683 553545.3 3 2015 "JANSSEN PHARMACEUTICA" 81 3 4107 27 993217.7 .1690204 .3288512 .10037537 510486.25 3 2016 "JANSSEN PHARMACEUTICA" 82 3 4638 35 1248302.9 .276376 .4183958 .28478423 824580.4 3 2017 "JANSSEN PHARMACEUTICA" 83 3 4598 60 2232884.5 .1948978 .7311364 .10594828 595216.2 3 2018 "JANSSEN PHARMACEUTICA" 84 3 4594 26 2598562.5 .2254114 .8562906 .14229459 684050.1 3 2019 "JANSSEN PHARMACEUTICA" 85 3 4644 28 3146280 .23141466 .9314913 .170169 781644.9 3 2020 "JANSSEN PHARMACEUTICA" 86 3 4780 26 3596431.5 .23680106 1.0004208 .16852586 851280.6 4 2011 "EXXONMOBIL PETROLEUM & CHEMICAL" 34 3 2165 5 14695388 .01654665 1.0425363 .0405766 233238.34 4 2012 "EXXONMOBIL PETROLEUM & CHEMICAL" 35 3 2176 9 15152519 .021001816 1.0488497 .05068856 303409 4 2013 "EXXONMOBIL PETROLEUM & CHEMICAL" 36 3 2170 2 14052597 .018379996 1.0666726 .06182827 242142.4 4 2014 "EXXONMOBIL PETROLEUM & CHEMICAL" 37 3 2175 6 13759912 .012923716 .984891 .04221074 180557.23 4 2015 "EXXONMOBIL PETROLEUM & CHEMICAL" 38 3 2205 8 10691972 .016416688 .44692335 .0766432 392744.7 4 2016 "EXXONMOBIL PETROLEUM & CHEMICAL" 39 3 2197 8 9175407 .015552402 .3692766 .07220461 386430.1 4 2017 "EXXONMOBIL PETROLEUM & CHEMICAL" 40 3 2187 9 10887637 .0142755 .4387161 .06504572 354275.7 4 2018 "EXXONMOBIL PETROLEUM & CHEMICAL" 41 3 2214 6 12088139 .007806151 .5007361 .03043898 188446.25 4 2019 "EXXONMOBIL PETROLEUM & CHEMICAL" 42 3 2304 6 9358546 .005687178 .411369 -.005528804 129381.95 4 2020 "EXXONMOBIL PETROLEUM & CHEMICAL" 43 3 2238 6 5614647 .0026105626 .244941 -.026500564 59840.48 5 2011 "ELECTRABEL" 106 4 5801 17 2506160 .029713834 .3143761 .0596582 236874.33 5 2012 "ELECTRABEL" 107 4 5628 12 2446999 .015836613 .24370015 .029047446 159015.8 5 2013 "ELECTRABEL" 108 4 5390 13 2313214.8 .012486763 .2392646 .019821707 120722.27 5 2014 "ELECTRABEL" 109 4 5177 9 2397515.3 .01081251 .2395058 .016956229 108236.05 5 2015 "ELECTRABEL" 110 4 4851 9 2179901.3 .0013057678 .2098209 -.004433884 13566.07 5 2016 "ELECTRABEL" 111 4 4754 9 2510287.5 .00748435 .2291251 .007621482 81998.32 5 2017 "ELECTRABEL" 112 4 4659 12 2437122.5 .0040441956 .2171694 -.002519652 45384.85 5 2018 "ELECTRABEL" 113 4 4653 11 2380639.3 -.00935939 .22802757 -.04235218 -97713.3 5 2019 "ELECTRABEL" 114 4 4690 11 2267758.8 .0024504496 .20909855 -.011275722 26576.12 5 2020 "ELECTRABEL" 115 4 4655 5 2345807.8 .01092992 .26049516 .004661544 98425.99
My dependent variable is called "Collectieve ontslagen" (in Dutch). Explanation: I have to create a new dummy variable that indicates wether a firm has dismissed >= 10% of the workforce. Although there are some points of attention:
- The variable should be created with 2 existing variables: "Aantalwn" = total employees in the firm in that year & "Afdankingen" = the amount of dismissals in that year.
- The "dismissals" (Afdankingen) (year "n") have to be divided by "total employees" (Aantalwn) of the year BEFORE the dismissals occur (year "n-1"). So the first observation of each firm should always be a missing value.
- It is important that Stata recognises when the data switches to another firm. "ID" indicates this. Not every firm has an observation for the period 2011-2020. Some firms were only created in 2018 and therefor only have 3 years of observations.
Currently I was using this code:
Code:
bysort ID (Year): generate Collectiefontslag= Afdankingen[_n+1]/ Aantalwn>=0.10 if !missing( Totaalaantalwn[_n], Afdankingen[_n+1]) & Year[_n+1]==Year+1
Generating independent variables:
I have two independent variables, both binary. Due to the fact that both rely on the same idea, I'm only going to explain one of them to keep this as simple as possible.
I am investiging wether or not a decrease in productivity has an impact on the fact that a firm dismisses >= 10% of the workforce.
Productivity (dutch: productiviteit) is a variable I created myself by doing:
Code:
Omzet/Aantalwn
I have to measure this variable in 2 ways:
1) If there is a decrease in productivity (compared to the year before), dummy should be "1" in that specific year. No decrease: "0". Same remark as above: Stata should recognise when the data switches to another firm & the first observation should be a missing value (when productivity n+1 - productivity n => 0 => increase, so value 0. When productivity n+1 - productivity n < 0 => decrease in productivity, so value 1).
This dummy should be called something like: "Dummy decrease productivity"
2) Instead of just looking at a decrease in productivity, I also have to look at the median of productivity. The median of productivity first has to be calculated (of the whole sample) & a dummy has to be generated that indicates if the specific firm is BELOW THE MEDIAN in each specific year (=1) or above the median (=0).
This dummy should be called something like: "Dummy median productivity"
I tried the following code myself, but I am not sure if it is correct:
Code:
egen medianProductiviteit= median( Productiviteit) gen below_medianPROD= Productiviteit < medianProductiviteit
Kind regards,
Jordi
Comment