Wrong code for generating variables

Jordi Imbrechts

Join Date: Apr 2022
Posts: 44

Wrong code for generating variables

29 Apr 2022, 13:10

Dear all,

My data consists of panel data (unbalanced). Period 2011-2020. Belgian firms.

For one of my independent variable, I have created a dummy that indicates if there has been a decrease in productivity for following 2 years (value = 1). To control for even more strictness, I have created the same dummy but this time it indicates when there has been a decrease in productivity for 3 following years (value = 1). As you can see below, the first observation always has to be blank. Simply because there is no way to create a dummy here when there is only 1 value of productivity.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID float(Productiviteit_w5 DalingProductiviteit2J DalingProductiviteit3J)
1   4587718 . .
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
1   4587718 0 0
2  158699.6 . .
2  51460.25 0 0
2  51460.25 0 0
2  51460.25 0 0
2  51460.25 0 0
2   4587718 0 0
2   4587718 0 0
2   4587718 0 0
2   4587718 0 0
2   4587718 0 0
3  634812.6 . .
3  763892.2 0 0
3  875231.2 0 0
3  962520.2 0 0
3  993217.7 0 0
3 1248302.9 0 0
3 2232884.5 0 0
3 2598562.5 0 0
3   3146280 0 0
3 3596431.5 0 0
4   4587718 . .
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
4   4587718 0 0
5   2506160 . .
5   2446999 0 0
5 2313214.8 1 0
5 2397515.3 0 0
5 2179901.3 0 0
5 2510287.5 0 0
5 2437122.5 0 0
5 2380639.3 1 0
5 2267758.8 1 1
5 2345807.8 0 0
end

I have created this dummy with following commands:

Code:

gen DalingProductiviteit2J = 0
replace DalingProductiviteit2J = . if Productiviteit_w5 == .
bysort ID (Jaar): replace DalingProductiviteit2J = 1 if Productiviteit_w5 < L.Productiviteit_w5 & L.Productiviteit_w5 < L2.Productiviteit_w5 & ID==L.ID & ID==L2.ID
gen DalingProductiviteit3J = 0
replace DalingProductiviteit3J = . if DalingProductiviteit2J == .
replace DalingProductiviteit3J = 1 if DalingProductiviteit2J == 1 & L.DalingProductiviteit2J == 1

Sorry for the variable names being in Dutch, but the idea behind this dummy should be clear. If not, please let me know.

As a robustness check, I have defined "productivity" in a similar way & I tried to create the exact same dummy as above, but with just a different definition of "productivity". This time I looked at operating revenue per employee:

Code:

gen DalingBedrijfsresultaatperwn2J = 0
replace DalingBedrijfsresultaatperwn2J = . if Bedrijfsresultaatperwn_w1 == .
bysort ID (Jaar): replace DalingBedrijfsresultaatperwn2J = 1 if Bedrijfsresultaatperwn_w1 < L.Bedrijfsresultaatperwn_w1 & L.Bedrijfsresultaatperwn_w1 < L2.Bedrijfsresultaatperwn_w1 & ID==L.ID & ID==L2.ID
gen DalingBedrijfsresultaatperwn3J = 0
replace DalingBedrijfsresultaatperwn3J = . if DalingBedrijfsresultaatperwn2J == .
replace DalingBedrijfsresultaatperwn3J = 1 if DalingBedrijfsresultaatperwn2J == 1 & L.DalingBedrijfsresultaatperwn2J == 1

Problem: The dummy I generated first clearly knows that the first value of each firm (ID) should be empty. As you can see below, the second dummy (with the exact same code) puts "0" in the first observation. This is wrong. But I just can't seem to find the problem here... Anyone else that sees the problem?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID float(Bedrijfsresultaatperwn_w1 DalingBedrijfsresultaatperwn2J DalingBedrijfsresultaatperwn3J)
1  64952.91 0 0
1  99873.05 0 0
1  68731.36 0 0
1  76606.58 0 0
1  75386.92 0 0
1 102321.81 0 0
1 102532.42 0 0
1 123674.22 0 0
1 108004.75 0 0
1 115439.52 0 0
2 14075.665 0 0
2  49803.37 0 0
2  52598.84 0 0
2  52297.27 0 0
2  57116.21 0 0
2  34775.42 0 0
2  45138.53 0 0
2  44305.94 0 0
2  164799.1 0 0
2  86502.11 0 0
3  421836.3 0 0
3 466955.25 0 0
3 571562.75 0 0
3  553545.3 0 0
3 510486.25 1 0
3  824580.4 0 0
3  595216.2 0 0
3  684050.1 0 0
3  781644.9 0 0
3  851280.6 0 0
4 233238.34 0 0
4    303409 0 0
4  242142.4 0 0
4 180557.23 1 0
4  392744.7 0 0
4  386430.1 0 0
4  354275.7 1 0
4 188446.25 1 1
4 129381.95 1 1
4  59840.48 1 1
5 236874.33 0 0
5  159015.8 0 0
5 120722.27 1 0
5 108236.05 1 1
5  13566.07 1 1
5  81998.32 0 0
5  45384.85 0 0
5  -97713.3 1 0
5  26576.12 0 0
5  98425.99 0 0
end

Kind regards,
Jordi

Tags: None

Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#2

29 Apr 2022, 13:17

Using Stata 16 (Windows 11)
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#3

29 Apr 2022, 13:31

Fixed it. Thanks anyway.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

29 Apr 2022, 13:56

I am surprised the code worked in the first instance. When I run the code on your example data (I have to generate Jaar because you neglected to include it, and I add the necessary xtset command so the lag operators work) I get 0 in the first Jaar for each ID.

Nothing in your code will cause the value to be missing in the first Jaar for each ID. So you need to add something like

Code:

bysort ID (Jaar): replace DalingProductiviteit2J = . if _n==1

to create a missing value in the first Jaar for each ID.
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#5

01 May 2022, 10:59

Thank you very much William.
Comment

Announcement

Wrong code for generating variables

Comment

Comment

Comment

Comment