Adding additional variables to difference in difference regression

Yasmin Diaz

Join Date: Mar 2022
Posts: 4

Adding additional variables to difference in difference regression

23 Mar 2022, 08:26

Hello everyone,

We are currently writing our master thesis and want to conduct a difference in difference regression to see whether green bond issuances (as the treatment) have an effect on company ESG performance (looking at ESG scores one year before and one year after green bond issuance). The control group consists of conventional bond issuances.

Here is an exemplary snip of our data setup, showing that we have dummies for both treatment and time and the other variables as displayed.

ID	Treatment	Revenue	Country	BICS Level 1	Year	Amount	Time	E Score	S Score	G Score	Certified
1	1	7973	EU	Utilities	2019	448,500,000	0	90.87	90.87	64.71	0
2	1	8049.80	EU	Industrials	2018	173,829,400	0	92.51	95.24	79.96	1
3	1	26698.70	EU	Financials	2018	1,197,620,000	0	86.67	89.97	90.70	1
4	0	61125.20	CN	Financials	2019	1,098,410,000	0	84.04	89.74	93.73	0
5	0	1997.30	CN	Others	2019	569,140,000	0	68.77	40.54	34.00	0
1	1	7973	EU	Utilities	2019	448,500,000	1	81.27	81.27	81.86	0
2	1	8049.80	EU	Industrials	2018	173,829,400	1	90.26	96.13	76.46	1
3	1	26698.70	EU	Financials	2018	1,197,620,000	1	84.86	81.67	79.17	1
4	0	61125.20	CN	Financials	2019	1,098,410,000	1	82.17	85.92	88.13	0
5	0	1997.30	CN	Others	2019	569,140,000	1	75.08	57.41	60.08	0

So far, our code in STATA looks like this (exemplary for the effect on the E of the ESG score):

reg escore time##treatment

Now we also wanted to add further variables (for example revenue, country, BICS Level 1 etc.)

However, when we type in the following, we are afraid it would not work properly as the "double" entries in the dataset would be regarded twice when regressing escore on, for instance, revenue.

reg escore time##treatment revenue

Results:

reg escore time##treatment revenue

Source | SS df MS Number of obs = 390
-------------+---------------------------------- F(4, 385) = 8.56
Model | 24098.3887 4 6024.59717 Prob > F = 0.0000
Residual | 270853.19 385 703.514779 R-squared = 0.0817
-------------+---------------------------------- Adj R-squared = 0.0722
Total | 294951.578 389 758.230279 Root MSE = 26.524

--------------------------------------------------------------------------------
escore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.time | 5.628235 3.714082 1.52 0.130 -1.674189 12.93066
1.treatment | 10.50139 3.803451 2.76 0.006 3.023258 17.97953
|
time#treatment |
1 1 | -1.366514 5.378081 -0.25 0.800 -11.9406 9.207572
|
revenue | .0001543 .0000378 4.08 0.000 .00008 .0002286
_cons | 50.43344 2.73697 18.43 0.000 45.05216 55.81472
--------------------------------------------------------------------------------

Can anyone help us, please? Can we just add more variables to the (linear) regression (like we did in the example), given how our dataset is constructed?

Thank you!

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3048
#2

23 Mar 2022, 18:31

why is time different than year?
1 like
Comment
Yasmin Diaz

Join Date: Mar 2022

Posts: 4
#3

24 Mar 2022, 03:22

Dear George,

Thank you for your reply!

We actually just noticed that this is a mistake, so thank you for spotting it! The "year" column refers to the issuance year. If that year was 2019 (for example), then it should have said 2018 and 2020 in the year column (for both time = 0 and time = 1). We will correct this in our dataset! So far, we have only included the "time" variable so hopefully the model was not affected by this mistake. The other variables are all correct, for example the ESG scores for one year before and one year after.

I hope this makes sense.

BR
Yasmin

Last edited by Yasmin Diaz; 24 Mar 2022, 03:33.
Comment
Yasmin Diaz

Join Date: Mar 2022

Posts: 4
#4

24 Mar 2022, 07:58

Maybe this is also not very clear, but we used the revenue of 2019 as a reference point. This is why the revenue is the same for both time = 0 and time = 1. Is that even correct? or would we also need to use the revenue one year pre and one year post issuance to include it as a control variable? And how do we then deal with variables that do not vary over time (like country or industry)? Can we still include these as control variables, and how would this work in practice? Many thanks in advance!
Comment
George Ford

Join Date: Aug 2014

Posts: 3048
#5

24 Mar 2022, 08:55

I think you need an industry fixed effect and probably a country fixed effect. Since revenue doesn't change, the FE will eat it (and any other variable that is not temporally changing).

Code:

reghdfe escore treatment, absorb(BICS country year)

Or, use a first difference on ESG score and drop year FE.

You'd have to re-specify if you think the effect may differ by industry and country.

I'd study the underlying ESG scoring method you're using carefully. There may be very formulaic approach to green bonds, or maybe not.
Comment
Yasmin Diaz

Join Date: Mar 2022

Posts: 4
#6

24 Mar 2022, 09:02

Thank you very much for taking the time! We will look into your suggestions now carefully and make sure to understand what works best for us. Have a good day!
Comment

Announcement

Adding additional variables to difference in difference regression

Comment

Comment

Comment

Comment

Comment