Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reversed Estimates From Simulated Event Study?

    Hi all,

    I was trying to wrap my head around running event studies in Stata, so I made some fake data (code, data, regression output below). My issue is that I expect to be getting coefficient estimates of roughly +50 after treatment and roughly 0 before treatment (since that's how I've specified the DGP). Instead, I'm getting estimates of roughly -50 before treatment and roughly 0 after treatment. I don't have this issue if I run a standard diff-in-diff. What am I doing wrong here? Thanks in advance for the help!


    Code:
    clear all
    set seed 333
    
    set obs 3
    g state = _n
    g state_fe = runiform(-3,3)
    
    expand 6
    sort state
    g year = (state != state[_n-1])
    replace year = year[_n-1]+1 if year == 0
    
    g year_fe = 0
    forvalues YEAR = 0/6 {
        local fe = runiform(-3,3)
        replace year_fe = `fe' if year == `YEAR'
    }
    
    g treated_state = (state == 2)
    g treated_time = (year >= 4)
    g treated = treated_state * treated_time
    
    g outcome = 50 * treated + state_fe + year_fe + rnormal()
    
    gen interaction = 0
    forvalues YEAR = 1/6{
        replace interaction = treated_state * `YEAR' if year == `YEAR'
    }
    
    reg outcome treated i.state i.year
    
    reg outcome ib4.interaction i.state i.year

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(state state_fe year year_fe treated_state treated_time treated outcome interaction)
    1  2.2551877 1 -1.6628886 0 0 0  1.9787244 0
    1  2.2551877 2   2.988468 0 0 0   6.868975 0
    1  2.2551877 3 -2.3007996 0 0 0 -1.0354497 0
    1  2.2551877 4   2.301502 0 1 0   4.602682 0
    1  2.2551877 5 -.12248517 0 1 0   .6440572 0
    1  2.2551877 6 -1.6121435 0 1 0   .3240579 0
    2 -1.4768895 1 -1.6628886 1 0 0 -4.1191525 1
    2 -1.4768895 2   2.988468 1 0 0  -.1563899 2
    2 -1.4768895 3 -2.3007996 1 0 0 -4.4726853 3
    2 -1.4768895 4   2.301502 1 1 1   50.23074 4
    2 -1.4768895 5 -.12248517 1 1 1   49.10887 5
    2 -1.4768895 6 -1.6121435 1 1 1   46.03123 6
    3  -.8539335 1 -1.6628886 0 0 0 -1.5823573 0
    3  -.8539335 2   2.988468 0 0 0  2.2859316 0
    3  -.8539335 3 -2.3007996 0 0 0 -3.5709186 0
    3  -.8539335 4   2.301502 0 1 0  .04213403 0
    3  -.8539335 5 -.12248517 0 1 0  -.9253201 0
    3  -.8539335 6 -1.6121435 0 1 0  -1.445516 0
    end
    Click image for larger version

Name:	Screenshot 2024-11-10 160424.png
Views:	1
Size:	34.2 KB
ID:	1767363
    Click image for larger version

Name:	Screenshot 2024-11-10 155710.png
Views:	1
Size:	45.6 KB
ID:	1767362


  • #2
    First, your variable, interaction, is wrongly constructed. The interaction you need, between treatment and year, with year treated as a discrete variable is the variable you call treated. And, indeed, -reg outcome i.treated i.state i.year- works correctly and produces the expected result.

    But when you created the variable you call interaction, what you create is zero in the non-treated states and a copy of year in the treated state. This does not represent a treatment#time interaction term in the proper sense when then used in a regression with year still treated as a discrete variable. So it is not reasonable to expect the coefficient of this variable to capture treatment effect in that model.

    Finally, the best way to set up this regression is quite simpler:
    Code:
    reg outcome i.treated_state##i.treated_time
    This captures the treatment effect in the coefficient of the interaction term and provides the least amount of additional superfluous output. There is no need to create the variable treated, nor the variable interaction (which is the wrong variable, anyway). This simplified regression works here because all treated states (well, there is only one in this demonstration) begin treatment at the same time, and there are no missing data.

    Comment


    • #3
      Hi Clyde,

      Thank you for the quick response!

      My reason for creating "interaction" is to produce an event study regression and figure:
      Click image for larger version

Name:	Screenshot 2024-11-10 165458.png
Views:	1
Size:	6.4 KB
ID:	1767370




      I'm clearly wrong, but I thought the estimated coefficients for each j.interaction corresponds to each rho_t (TreatmentGroup_s x 1{t=j}). What do you suggest instead?
      Last edited by Jerome Lyons; 10 Nov 2024, 15:55.

      Comment


      • #4
        My reason for creating "interaction" is to produce an event study regression and figure:
        But you didn't do that correctly. You calculated the variable interaction as TreatmentGroups x j, not TreatmentGroups x 1{t=j}. To reflect the formula with TreatmentGroups x 1{t=j} in your code, you would use -regress outcome i.treated_state##i.year. Note that that formula does not capture a single treatment effect. Instead it gives a separate treatment effect in each year:
        Code:
        . regress outcome i.treated_state##i.year
        
              Source |       SS           df       MS      Number of obs   =        18
        -------------+----------------------------------   F(11, 6)        =     98.25
               Model |  5989.73909        11  544.521735   Prob > F        =    0.0000
            Residual |  33.2535621         6  5.54226034   R-squared       =    0.9945
        -------------+----------------------------------   Adj R-squared   =    0.9844
               Total |  6022.99265        17  354.293685   Root MSE        =    2.3542
        
        ------------------------------------------------------------------------------------
                   outcome | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------------+----------------------------------------------------------------
           1.treated_state |  -4.317336   2.883295    -1.50   0.185     -11.3725    2.737833
                           |
                      year |
                        2  |    4.37927   2.354201     1.86   0.112    -1.381252    10.13979
                        3  |  -2.501368   2.354201    -1.06   0.329    -8.261889    3.259154
                        4  |   2.124225   2.354201     0.90   0.402    -3.636297    7.884746
                        5  |   -.338815   2.354201    -0.14   0.890    -6.099336    5.421706
                        6  |  -.7589126   2.354201    -0.32   0.758    -6.519434    5.001609
                           |
        treated_state#year |
                      1 2  |   -.416507   4.077595    -0.10   0.922    -10.39402    9.561009
                      1 3  |   2.147835   4.077595     0.53   0.617    -7.829681    12.12535
                      1 4  |   52.22567   4.077595    12.81   0.000     42.24816    62.20319
                      1 5  |   53.56684   4.077595    13.14   0.000     43.58932    63.54435
                      1 6  |    50.9093   4.077595    12.49   0.000     40.93178    60.88682
                           |
                     _cons |   .1981835   1.664671     0.12   0.909     -3.87512    4.271487
        ------------------------------------------------------------------------------------
        Notice that for year = 2 or 3, this treatment effect is, for practical purposes, 0. And for year = 4, 5, or 6, it is close to the hoped-for value of 50, but with some variation. (For t = 1, it is in the base category of the interaction, so constrained to be exactly 0.)

        Comment


        • #5
          Thanks Clyde, this makes a lot of sense and was exactly what I was looking for

          To be clear, I understand the formula does not capture a single treatment effect. In the DGP, there is a treatment effect every year of an additional 50 units to the outcome. The idea being that, if we removed treatment, the treated units would no longer benefit from said treatment. Or at least, that's how I understand it.

          Thanks again!

          Comment

          Working...
          X