Hello everyone,
We are currently writing our master thesis and want to conduct a difference in difference regression to see whether green bond issuances (as the treatment) have an effect on company ESG performance (looking at ESG scores one year before and one year after green bond issuance). The control group consists of conventional bond issuances.
Here is an exemplary snip of our data setup, showing that we have dummies for both treatment and time and the other variables as displayed.
So far, our code in STATA looks like this (exemplary for the effect on the E of the ESG score):
reg escore time##treatment
Now we also wanted to add further variables (for example revenue, country, BICS Level 1 etc.)
However, when we type in the following, we are afraid it would not work properly as the "double" entries in the dataset would be regarded twice when regressing escore on, for instance, revenue.
reg escore time##treatment revenue
Results:
reg escore time##treatment revenue
Source | SS df MS Number of obs = 390
-------------+---------------------------------- F(4, 385) = 8.56
Model | 24098.3887 4 6024.59717 Prob > F = 0.0000
Residual | 270853.19 385 703.514779 R-squared = 0.0817
-------------+---------------------------------- Adj R-squared = 0.0722
Total | 294951.578 389 758.230279 Root MSE = 26.524
--------------------------------------------------------------------------------
escore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.time | 5.628235 3.714082 1.52 0.130 -1.674189 12.93066
1.treatment | 10.50139 3.803451 2.76 0.006 3.023258 17.97953
|
time#treatment |
1 1 | -1.366514 5.378081 -0.25 0.800 -11.9406 9.207572
|
revenue | .0001543 .0000378 4.08 0.000 .00008 .0002286
_cons | 50.43344 2.73697 18.43 0.000 45.05216 55.81472
--------------------------------------------------------------------------------
Can anyone help us, please? Can we just add more variables to the (linear) regression (like we did in the example), given how our dataset is constructed?
Thank you!
We are currently writing our master thesis and want to conduct a difference in difference regression to see whether green bond issuances (as the treatment) have an effect on company ESG performance (looking at ESG scores one year before and one year after green bond issuance). The control group consists of conventional bond issuances.
Here is an exemplary snip of our data setup, showing that we have dummies for both treatment and time and the other variables as displayed.
ID | Treatment | Revenue | Country | BICS Level 1 | Year | Amount | Time | E Score | S Score | G Score | Certified |
1 | 1 | 7973 | EU | Utilities | 2019 | 448,500,000 | 0 | 90.87 | 90.87 | 64.71 | 0 |
2 | 1 | 8049.80 | EU | Industrials | 2018 | 173,829,400 | 0 | 92.51 | 95.24 | 79.96 | 1 |
3 | 1 | 26698.70 | EU | Financials | 2018 | 1,197,620,000 | 0 | 86.67 | 89.97 | 90.70 | 1 |
4 | 0 | 61125.20 | CN | Financials | 2019 | 1,098,410,000 | 0 | 84.04 | 89.74 | 93.73 | 0 |
5 | 0 | 1997.30 | CN | Others | 2019 | 569,140,000 | 0 | 68.77 | 40.54 | 34.00 | 0 |
1 | 1 | 7973 | EU | Utilities | 2019 | 448,500,000 | 1 | 81.27 | 81.27 | 81.86 | 0 |
2 | 1 | 8049.80 | EU | Industrials | 2018 | 173,829,400 | 1 | 90.26 | 96.13 | 76.46 | 1 |
3 | 1 | 26698.70 | EU | Financials | 2018 | 1,197,620,000 | 1 | 84.86 | 81.67 | 79.17 | 1 |
4 | 0 | 61125.20 | CN | Financials | 2019 | 1,098,410,000 | 1 | 82.17 | 85.92 | 88.13 | 0 |
5 | 0 | 1997.30 | CN | Others | 2019 | 569,140,000 | 1 | 75.08 | 57.41 | 60.08 | 0 |
So far, our code in STATA looks like this (exemplary for the effect on the E of the ESG score):
reg escore time##treatment
Now we also wanted to add further variables (for example revenue, country, BICS Level 1 etc.)
However, when we type in the following, we are afraid it would not work properly as the "double" entries in the dataset would be regarded twice when regressing escore on, for instance, revenue.
reg escore time##treatment revenue
Results:
reg escore time##treatment revenue
Source | SS df MS Number of obs = 390
-------------+---------------------------------- F(4, 385) = 8.56
Model | 24098.3887 4 6024.59717 Prob > F = 0.0000
Residual | 270853.19 385 703.514779 R-squared = 0.0817
-------------+---------------------------------- Adj R-squared = 0.0722
Total | 294951.578 389 758.230279 Root MSE = 26.524
--------------------------------------------------------------------------------
escore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.time | 5.628235 3.714082 1.52 0.130 -1.674189 12.93066
1.treatment | 10.50139 3.803451 2.76 0.006 3.023258 17.97953
|
time#treatment |
1 1 | -1.366514 5.378081 -0.25 0.800 -11.9406 9.207572
|
revenue | .0001543 .0000378 4.08 0.000 .00008 .0002286
_cons | 50.43344 2.73697 18.43 0.000 45.05216 55.81472
--------------------------------------------------------------------------------
Can anyone help us, please? Can we just add more variables to the (linear) regression (like we did in the example), given how our dataset is constructed?
Thank you!

Comment