Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding additional variables to difference in difference regression

    Hello everyone,

    We are currently writing our master thesis and want to conduct a difference in difference regression to see whether green bond issuances (as the treatment) have an effect on company ESG performance (looking at ESG scores one year before and one year after green bond issuance). The control group consists of conventional bond issuances.

    Here is an exemplary snip of our data setup, showing that we have dummies for both treatment and time and the other variables as displayed.

    ID Treatment Revenue Country BICS Level 1 Year Amount Time E Score S Score G Score Certified
    1 1 7973 EU Utilities 2019 448,500,000 0 90.87 90.87 64.71 0
    2 1 8049.80 EU Industrials 2018 173,829,400 0 92.51 95.24 79.96 1
    3 1 26698.70 EU Financials 2018 1,197,620,000 0 86.67 89.97 90.70 1
    4 0 61125.20 CN Financials 2019 1,098,410,000 0 84.04 89.74 93.73 0
    5 0 1997.30 CN Others 2019 569,140,000 0 68.77 40.54 34.00 0
    1 1 7973 EU Utilities 2019 448,500,000 1 81.27 81.27 81.86 0
    2 1 8049.80 EU Industrials 2018 173,829,400 1 90.26 96.13 76.46 1
    3 1 26698.70 EU Financials 2018 1,197,620,000 1 84.86 81.67 79.17 1
    4 0 61125.20 CN Financials 2019 1,098,410,000 1 82.17 85.92 88.13 0
    5 0 1997.30 CN Others 2019 569,140,000 1 75.08 57.41 60.08 0

    So far, our code in STATA looks like this (exemplary for the effect on the E of the ESG score):

    reg escore time##treatment

    Now we also wanted to add further variables (for example revenue, country, BICS Level 1 etc.)

    However, when we type in the following, we are afraid it would not work properly as the "double" entries in the dataset would be regarded twice when regressing escore on, for instance, revenue.

    reg escore time##treatment revenue


    Results:

    reg escore time##treatment revenue

    Source | SS df MS Number of obs = 390
    -------------+---------------------------------- F(4, 385) = 8.56
    Model | 24098.3887 4 6024.59717 Prob > F = 0.0000
    Residual | 270853.19 385 703.514779 R-squared = 0.0817
    -------------+---------------------------------- Adj R-squared = 0.0722
    Total | 294951.578 389 758.230279 Root MSE = 26.524

    --------------------------------------------------------------------------------
    escore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    1.time | 5.628235 3.714082 1.52 0.130 -1.674189 12.93066
    1.treatment | 10.50139 3.803451 2.76 0.006 3.023258 17.97953
    |
    time#treatment |
    1 1 | -1.366514 5.378081 -0.25 0.800 -11.9406 9.207572
    |
    revenue | .0001543 .0000378 4.08 0.000 .00008 .0002286
    _cons | 50.43344 2.73697 18.43 0.000 45.05216 55.81472
    --------------------------------------------------------------------------------


    Can anyone help us, please? Can we just add more variables to the (linear) regression (like we did in the example), given how our dataset is constructed?


    Thank you!

  • #2
    why is time different than year?

    Comment


    • #3
      Dear George,

      Thank you for your reply!

      We actually just noticed that this is a mistake, so thank you for spotting it! The "year" column refers to the issuance year. If that year was 2019 (for example), then it should have said 2018 and 2020 in the year column (for both time = 0 and time = 1). We will correct this in our dataset! So far, we have only included the "time" variable so hopefully the model was not affected by this mistake. The other variables are all correct, for example the ESG scores for one year before and one year after.

      I hope this makes sense.

      BR
      Yasmin
      Last edited by Yasmin Diaz; 24 Mar 2022, 03:33.

      Comment


      • #4
        Maybe this is also not very clear, but we used the revenue of 2019 as a reference point. This is why the revenue is the same for both time = 0 and time = 1. Is that even correct? or would we also need to use the revenue one year pre and one year post issuance to include it as a control variable? And how do we then deal with variables that do not vary over time (like country or industry)? Can we still include these as control variables, and how would this work in practice? Many thanks in advance!

        Comment


        • #5
          I think you need an industry fixed effect and probably a country fixed effect. Since revenue doesn't change, the FE will eat it (and any other variable that is not temporally changing).

          Code:
          reghdfe escore treatment, absorb(BICS country year)
          Or, use a first difference on ESG score and drop year FE.

          You'd have to re-specify if you think the effect may differ by industry and country.

          I'd study the underlying ESG scoring method you're using carefully. There may be very formulaic approach to green bonds, or maybe not.

          Comment


          • #6
            Thank you very much for taking the time! We will look into your suggestions now carefully and make sure to understand what works best for us. Have a good day!

            Comment

            Working...
            X