Reversed Estimates From Simulated Event Study?

Jerome Lyons

Join Date: Mar 2023
Posts: 13

Reversed Estimates From Simulated Event Study?

10 Nov 2024, 15:05

Hi all,

I was trying to wrap my head around running event studies in Stata, so I made some fake data (code, data, regression output below). My issue is that I expect to be getting coefficient estimates of roughly +50 after treatment and roughly 0 before treatment (since that's how I've specified the DGP). Instead, I'm getting estimates of roughly -50 before treatment and roughly 0 after treatment. I don't have this issue if I run a standard diff-in-diff. What am I doing wrong here? Thanks in advance for the help!

Code:

clear all
set seed 333

set obs 3
g state = _n
g state_fe = runiform(-3,3)

expand 6
sort state
g year = (state != state[_n-1])
replace year = year[_n-1]+1 if year == 0

g year_fe = 0
forvalues YEAR = 0/6 {
    local fe = runiform(-3,3)
    replace year_fe = `fe' if year == `YEAR'
}

g treated_state = (state == 2)
g treated_time = (year >= 4)
g treated = treated_state * treated_time

g outcome = 50 * treated + state_fe + year_fe + rnormal()

gen interaction = 0
forvalues YEAR = 1/6{
    replace interaction = treated_state * `YEAR' if year == `YEAR'
}

reg outcome treated i.state i.year

reg outcome ib4.interaction i.state i.year

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(state state_fe year year_fe treated_state treated_time treated outcome interaction)
1  2.2551877 1 -1.6628886 0 0 0  1.9787244 0
1  2.2551877 2   2.988468 0 0 0   6.868975 0
1  2.2551877 3 -2.3007996 0 0 0 -1.0354497 0
1  2.2551877 4   2.301502 0 1 0   4.602682 0
1  2.2551877 5 -.12248517 0 1 0   .6440572 0
1  2.2551877 6 -1.6121435 0 1 0   .3240579 0
2 -1.4768895 1 -1.6628886 1 0 0 -4.1191525 1
2 -1.4768895 2   2.988468 1 0 0  -.1563899 2
2 -1.4768895 3 -2.3007996 1 0 0 -4.4726853 3
2 -1.4768895 4   2.301502 1 1 1   50.23074 4
2 -1.4768895 5 -.12248517 1 1 1   49.10887 5
2 -1.4768895 6 -1.6121435 1 1 1   46.03123 6
3  -.8539335 1 -1.6628886 0 0 0 -1.5823573 0
3  -.8539335 2   2.988468 0 0 0  2.2859316 0
3  -.8539335 3 -2.3007996 0 0 0 -3.5709186 0
3  -.8539335 4   2.301502 0 1 0  .04213403 0
3  -.8539335 5 -.12248517 0 1 0  -.9253201 0
3  -.8539335 6 -1.6121435 0 1 0  -1.445516 0
end

Click image for larger version

Name: Screenshot 2024-11-10 155710.png
Views: 1
Size: 45.6 KB
ID: 1767362

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29453
#2

10 Nov 2024, 15:35

First, your variable, interaction, is wrongly constructed. The interaction you need, between treatment and year, with year treated as a discrete variable is the variable you call treated. And, indeed, -reg outcome i.treated i.state i.year- works correctly and produces the expected result.

But when you created the variable you call interaction, what you create is zero in the non-treated states and a copy of year in the treated state. This does not represent a treatment#time interaction term in the proper sense when then used in a regression with year still treated as a discrete variable. So it is not reasonable to expect the coefficient of this variable to capture treatment effect in that model.

Finally, the best way to set up this regression is quite simpler:

Code:

reg outcome i.treated_state##i.treated_time

This captures the treatment effect in the coefficient of the interaction term and provides the least amount of additional superfluous output. There is no need to create the variable treated, nor the variable interaction (which is the wrong variable, anyway). This simplified regression works here because all treated states (well, there is only one in this demonstration) begin treatment at the same time, and there are no missing data.
Comment
Jerome Lyons

Join Date: Mar 2023

Posts: 13
#3

10 Nov 2024, 15:51

Hi Clyde,

Thank you for the quick response!

My reason for creating "interaction" is to produce an event study regression and figure:

I'm clearly wrong, but I thought the estimated coefficients for each j.interaction corresponds to each rho_t (TreatmentGroup_s x 1{t=j}). What do you suggest instead?

Last edited by Jerome Lyons; 10 Nov 2024, 15:55.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29453

10 Nov 2024, 16:18

My reason for creating "interaction" is to produce an event study regression and figure:

But you didn't do that correctly. You calculated the variable interaction as TreatmentGroup_s x j, not TreatmentGroup_s x 1{t=j}. To reflect the formula with TreatmentGroup_s x 1{t=j} in your code, you would use -regress outcome i.treated_state##i.year. Note that that formula does not capture a single treatment effect. Instead it gives a separate treatment effect in each year:

Code:

. regress outcome i.treated_state##i.year

      Source |       SS           df       MS      Number of obs   =        18
-------------+----------------------------------   F(11, 6)        =     98.25
       Model |  5989.73909        11  544.521735   Prob > F        =    0.0000
    Residual |  33.2535621         6  5.54226034   R-squared       =    0.9945
-------------+----------------------------------   Adj R-squared   =    0.9844
       Total |  6022.99265        17  354.293685   Root MSE        =    2.3542

------------------------------------------------------------------------------------
           outcome | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------------+----------------------------------------------------------------
   1.treated_state |  -4.317336   2.883295    -1.50   0.185     -11.3725    2.737833
                   |
              year |
                2  |    4.37927   2.354201     1.86   0.112    -1.381252    10.13979
                3  |  -2.501368   2.354201    -1.06   0.329    -8.261889    3.259154
                4  |   2.124225   2.354201     0.90   0.402    -3.636297    7.884746
                5  |   -.338815   2.354201    -0.14   0.890    -6.099336    5.421706
                6  |  -.7589126   2.354201    -0.32   0.758    -6.519434    5.001609
                   |
treated_state#year |
              1 2  |   -.416507   4.077595    -0.10   0.922    -10.39402    9.561009
              1 3  |   2.147835   4.077595     0.53   0.617    -7.829681    12.12535
              1 4  |   52.22567   4.077595    12.81   0.000     42.24816    62.20319
              1 5  |   53.56684   4.077595    13.14   0.000     43.58932    63.54435
              1 6  |    50.9093   4.077595    12.49   0.000     40.93178    60.88682
                   |
             _cons |   .1981835   1.664671     0.12   0.909     -3.87512    4.271487
------------------------------------------------------------------------------------

Notice that for year = 2 or 3, this treatment effect is, for practical purposes, 0. And for year = 4, 5, or 6, it is close to the hoped-for value of 50, but with some variation. (For t = 1, it is in the base category of the interaction, so constrained to be exactly 0.)

Comment

Jerome Lyons

Join Date: Mar 2023

Posts: 13
#5

10 Nov 2024, 18:07

Thanks Clyde, this makes a lot of sense and was exactly what I was looking for

To be clear, I understand the formula does not capture a single treatment effect. In the DGP, there is a treatment effect every year of an additional 50 units to the outcome. The idea being that, if we removed treatment, the treated units would no longer benefit from said treatment. Or at least, that's how I understand it.

Thanks again!
Comment

Announcement

Reversed Estimates From Simulated Event Study?

Comment

Comment

Comment

Comment