Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-Difference: different moments of treatment & treatment periods

    Hi everybody,

    After spending a lot of time exploring the forum and other places online, I decided to make my first post here. There have been other topics here and elsewhere on sort like problems, yet I could not get to implement nor use the provided explanations in those myself - hence, this post. If there is something lacking in the manner I write my post, please let me know and I will change my behaviour accordingly in my future posts/replies.

    The issue I am facing is the following. My goal is to study the treatment effect in the post-treatment period. Below I provided you with an example of my dataset (except other variables I dropped beforehand for the sake of posting it).


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(year outcome_var eventdate) int tau float(eventid treated) long firmid float active_treated
    2016  66.94362    .  . . 0  1 0
    2015  97.75673    .  . . 0  2 0
    2016  72.11331    .  . . 0  2 0
    2017  39.52795    .  . . 0  2 0
    2018 23.114513    .  . . 0  2 0
    2019  7.986643    .  . . 0  2 0
    2020    13.991    .  . . 0  2 0
    2007 3.5584955    .  . . 0  3 0
    2008  5.740957    .  . . 0  3 0
    2007 138.97435    .  . . 0  4 0
    2009 134.36954    .  . . 0  5 0
    2010  29.01804    .  . . 0  5 0
    2011 28.438017    .  . . 0  5 0
    2012 16.668228 2014 -2 1 1  5 0
    2013 14.306916 2014 -1 1 1  5 0
    2014  8.754341 2014  0 1 1  5 0
    2015 3.3422906 2014  1 1 1  5 1
    2016  2.941602 2014  2 1 1  5 1
    2017 23.094175    .  . . 0  5 0
    2018 16.317652    .  . . 0  5 0
    2019  43.08735    .  . . 0  5 0
    2012  5.684552    .  . . 0  6 0
    2013  5.691111    .  . . 0  6 0
    2014  4.784706    .  . . 0  6 0
    2015  3.765625    .  . . 0  6 0
    2016 2.2593336    .  . . 0  6 0
    2017  5.128872    .  . . 0  6 0
    2018  7.633845    .  . . 0  6 0
    2019  9.589101    .  . . 0  6 0
    2020  7.675261    .  . . 0  6 0
    2007  7.182402    .  . . 0  7 0
    2007    22.276    .  . . 0  8 0
    2008  12.48082    .  . . 0  8 0
    2009 11.706511    .  . . 0  8 0
    2007  91.02238    .  . . 0  9 0
    2007         0    .  . . 0 10 0
    2008         0    .  . . 0 10 0
    2009         0    .  . . 0 10 0
    2010         0    .  . . 0 10 0
    2011         0    .  . . 0 10 0
    2007  7.167628    .  . . 0 11 0
    2008  6.927828    .  . . 0 11 0
    2009  7.920464    .  . . 0 11 0
    2010  8.580811    .  . . 0 11 0
    2011  8.333334    .  . . 0 11 0
    2012  7.820124    .  . . 0 11 0
    2013  7.298481    .  . . 0 11 0
    2014  7.940303    .  . . 0 11 0
    2015  8.059015    .  . . 0 11 0
    2016  8.400145    .  . . 0 11 0
    2017  8.711821    .  . . 0 11 0
    2018  9.534941    .  . . 0 11 0
    2007  9.669005    .  . . 0 12 0
    2008  9.435482    .  . . 0 12 0
    2009 9.4710245    .  . . 0 12 0
    2010 11.481378    .  . . 0 12 0
    2011 12.359738    .  . . 0 12 0
    2012   11.5619    .  . . 0 12 0
    2013  6.645917    .  . . 0 12 0
    2014  6.642959    .  . . 0 12 0
    2015  6.885567 2017 -2 4 1 12 0
    2016  6.819162 2017 -1 4 1 12 0
    2017  8.159912 2017  0 4 1 12 0
    2018  7.521748 2017  1 4 1 12 1
    2019  7.647943 2017  2 4 1 12 1
    2020  6.992603    .  . . 0 12 0
    2010 17.955278    .  . . 0 13 0
    2011  18.86044    .  . . 0 13 0
    2012 16.681175    .  . . 0 13 0
    2013 16.993082 2015 -2 5 1 13 0
    2014 18.281563 2015 -1 5 1 13 0
    2015  19.40155 2015  0 5 1 13 0
    2016 17.809502 2015  1 5 1 13 1
    2017 18.815565 2015  2 5 1 13 1
    2018  32.83058    .  . . 0 13 0
    2019 20.417244    .  . . 0 13 0
    2020 16.938232    .  . . 0 13 0
    2018  65.71169    .  . . 0 14 0
    2019  87.09094    .  . . 0 14 0
    2020 12.606636    .  . . 0 14 0
    2012  45.64033    .  . . 0 15 0
    2013  43.29089    .  . . 0 15 0
    2014        36    .  . . 0 15 0
    2007  45.25854    .  . . 0 16 0
    2008  41.71629    .  . . 0 16 0
    2009 33.942085    .  . . 0 16 0
    2010 29.673445    .  . . 0 16 0
    2011  25.44216    .  . . 0 16 0
    2012  20.34817    .  . . 0 16 0
    2013  16.21955    .  . . 0 16 0
    2014  16.72103    .  . . 0 16 0
    2015 15.619315    .  . . 0 16 0
    2016 15.099396    .  . . 0 16 0
    2017 14.908017    .  . . 0 16 0
    2018 12.681622    .  . . 0 16 0
    2019  12.15221    .  . . 0 16 0
    2020 11.744678    .  . . 0 16 0
    2021 14.380157    .  . . 0 16 0
    2007  26.58697    .  . . 0 17 0
    2008   34.0246    .  . . 0 17 0
    end
    format %ty year
    format %ty eventdate
    label values firmid firmid
    label def firmid 1 "000850", modify
    label def firmid 2 "000899", modify
    label def firmid 3 "001058", modify
    label def firmid 4 "001630", modify
    label def firmid 5 "00163U", modify
    label def firmid 6 "00182C", modify
    label def firmid 7 "00202H", modify
    label def firmid 8 "002083", modify
    label def firmid 9 "00211Y", modify
    label def firmid 10 "002564", modify
    label def firmid 11 "002567", modify
    label def firmid 12 "002824", modify
    label def firmid 13 "00287Y", modify
    label def firmid 14 "00288U", modify
    label def firmid 15 "00289Y", modify
    label def firmid 16 "003654", modify
    label def firmid 17 "00383Y", modify

    I have tried numerous methods multiple times but cannot find the way to the results I am looking for. My data analysis skills aren't the best as you notice, but with some advice and guidance, I hope to implement a better method of estimating a working model.

    The variable year is the time variable I use whereas firmid is the panel variable I use. tau = 0 indicates the time of treatment for that specific event or the number of years it is apart from the event. Hence, if eventdate = year, tau = 0.
    treated indicates all firms that are or will be subject to treatment by a value of 1, for firms that are not and will not be treated it is 0. Variable active_treated = 1 when treated = 1 and if year > eventdate ; since this implies the post-treatment period.

    All different values of firmid either have only 1 treatment in the dataset (treated = 1) or they don't get treated at all (treated = 0).

    The difficulty I face is how I can estimate a regression consisting of the dummy variable active_treated and some sort of post-treatment dummy variable. The active_treated dummy could normally be seen as the interactive dummy that is generated by treated * post_dummy, but in this case with multiple and different moments of treatment, this post-treatment dummy would not be consistent for the control group (treated = 0). The control group's observations would namely be prone to the eventdate of the different treatments - resulting in the post-treatment dummy having to fluctuate for the non-treated firms in the dataset..

    I tried using the command tvdiff yesterday and considered the use of loops today instead, by looping a regression for every year an event has occurred and then somehow aggregating all of those results in some way. In which way, I do not know.

    In the end, I'd want to obtain a range of t-2 to t+2 and see the effect a moment of treatment on average has.

    Hopefully someone could help me out and suggest an approach I should consider using. After spending (too) much time myself by now, I lost track of the possible ways that I could make a sensible model. In case there is something missing in my explanation or problem or if I made an error in the manner of posting on this forum I am more than delighted to hear so as well.

    Thanks a lot in advance for your help/tips/advice!

  • #2
    With different firms beginning treatment in different time periods you cannot, as you have noticed, do classical DID estimation, for precisely the reasons you explain. You can do generalized DID estimation instead. https://www.annualreviews.org/doi/pd...-040617-013507 will explain the underlying theory. In terms of code it's actually pretty simple and yo already have the variables you need:

    Code:
    xtset firmid year
    xtreg outcome_var i.active_treated i.year, fe
    The coefficient of active_treated is then your generalized DID estimate of the treatment effect.

    Comment


    • #3
      Hi Clyde,

      First of all much thanks for your quick answer! You can nowhere near imagine how glad I am with finally making some progress here - it's a huge relief of stress, thank you.

      I started right away with what you said and made use of your code (and added the control variables to my regression).

      The output I got is as follows:

      Code:
      . xtreg outcome_var i.active_treated i.year ln_control1 ln_control2 ln_control3 control4, fe
      
      Fixed-effects (within) regression               Number of obs     =      2,766
      Group variable: firmid                          Number of groups  =        613
      
      R-sq:                                           Obs per group:
           within  = 0.3855                                         min =          1
           between = 0.3497                                         avg =        4.5
           overall = 0.4286                                         max =         13
      
                                                      F(17,2136)        =      78.82
      corr(u_i, Xb)  = 0.2554                         Prob > F          =     0.0000
      
      ----------------------------------------------------------------------------------
           outcome_var |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      1.active_treated |  -.2165202   .0728855    -2.97   0.003    -.3594542   -.0735862
                       |
                  year |
                 2008  |   .1029032   .0636585     1.62   0.106    -.0219359    .2277423
                 2009  |   .2924817    .062812     4.66   0.000     .1693026    .4156608
                 2010  |   .3844749   .0628748     6.11   0.000     .2611727    .5077772
                 2011  |   .4445901   .0642398     6.92   0.000      .318611    .5705692
                 2012  |   .4517826   .0639523     7.06   0.000     .3263674    .5771977
                 2013  |   .5625543   .0645987     8.71   0.000     .4358714    .6892373
                 2014  |   .5070997   .0648636     7.82   0.000     .3798972    .6343021
                 2015  |   .2927856   .0671322     4.36   0.000     .1611343    .4244369
                 2016  |   .2198023   .0696381     3.16   0.002     .0832368    .3563678
                 2017  |  -.0864471   .0716996    -1.21   0.228    -.2270553    .0541612
                 2018  |  -.8771411   .0789881   -11.10   0.000    -1.032043   -.7222395
                 2019  |  -2.586333   .1162921   -22.24   0.000    -2.814391   -2.358276
                       |
           ln_control1 |   .2155984   .0334276     6.45   0.000     .1500443    .2811524
           ln_control2 |   .0558043    .031344     1.78   0.075    -.0056636    .1172723
           ln_control3 |   .0564354   .0150453     3.75   0.000     .0269305    .0859403
              control4 |  -3.65e-06   1.81e-06    -2.02   0.043    -7.20e-06   -1.12e-07
                 _cons |   1.254286   .1258668     9.97   0.000     1.007452    1.501121
      -----------------+----------------------------------------------------------------
               sigma_u |  .91145661
               sigma_e |  .59931583
                   rho |  .69815143   (fraction of variance due to u_i)
      ----------------------------------------------------------------------------------
      F test that all u_i=0: F(612, 2136) = 9.06                   Prob > F = 0.0000
      
      .
      Please correct me if I am wrong: can I say with those results that the coefficient for active_treated of -.2165 implies that the average treatment effect on the treated (so, ATT) is about -21.65% - meaning that outcome_var (also in denoted as ln(yvar)) on average after treatment is 21.65% lower for the treated than the untreated firms?

      And now that I am responding already anyways: how should I, for instance, graphically present such results if I want to show a time window of t-2 up to t+2? And what is best to do to provide some graphical output to show a parallel trend of some sort too? Is this needed or should this be tested the same way as it normally is?

      Thanks for the link to the article from annual reviews as well - I actually had it open in another tab I noticed, though haha. Again thank you!
      Last edited by Jan Mooiweer; 02 Jul 2021, 16:35. Reason: EDIT: added note that outcome_var is denoted as ln(yvar)

      Comment


      • #4
        Please correct me if I am wrong: can I say with those results that the coefficient for active_treated of -.2165 implies that the average treatment effect on the treated (so, ATT) is about -21.65% - meaning that outcome_var (also in denoted as ln(yvar)) on average after treatment is 21.65% lower for the treated than the untreated firms?
        Not quite. You can say that the ATT is a redeuction of .2165 in outcome_var. It is an absolute reduction, not a percentage.

        Now, if outcome_var is itself the natural log of an outcome variable that you are actually interested in, then you can say something like that, but it is an approximation, and an approximation that is only good for small effects. So if ln y decreases by .2165, that means that y itself decreases by a factor of exp(-.2165) = 0.8053 (to 4 decimal places). That corresponds to a reduction of 100*(1-.8053) = 19.47%.

        It is difficult to do a convincing graph for the parallel trends assumption when the treatment onset varies. First, you can't define t-2 up to t+2 for the untreated firms. Even if you could, in your model you have adjusted for some additional covariates, so just plotting the mean values in each group in each time period would fail to adjust for those covariates and might give a misleading impression (potentially wrong in either direction). I think your best bet is to do a formal test. Add i.treated##i.year (N.B. treated, not active_treated in the interaction, but keep the i.active_treated term as well) to your model and re-run the regression and then -testparm i.treated#i.year) to see if these interaction terms (which estimate the differences in time trends between the treated and untreated in each year) are jointly statistically significant. If they are, that would suggest that time trends, after appropriate adjustments and accounting for the effect of the treatment itself, are not the same in the treated and untreated firms.

        Comment


        • #5
          Thanks again, Clyde, for the clear explanation.


          Now, if outcome_var is itself the natural log of an outcome variable that you are actually interested in, then you can say something like that, but it is an approximation, and an approximation that is only good for small effects.
          I made an edit to #3 shortly after posting it where I tried to make clear that it is indeed the natural log of the outcome_var (ln(outcome_var)) instead of the outcome_var itself. As you specified in more appropriate manner, the coefficient is indeed better interpreted as exp(-.2165) instead of simply multiplying it by 100 right away.

          Also, great advice on making use of -testparm- in that way! I tried it out yesterday and obtained the following output:

          Code:
          . xtreg ln_outcome_var i.treated##i.year i.active_treated i.year ln_control1 ln_control2 ln_control3 control4, fe 
          
          Fixed-effects (within) regression               Number of obs     =      2,766
          Group variable: firmid                          Number of groups  =        613
          
          R-sq:                                           Obs per group:
               within  = 0.3874                                         min =          1
               between = 0.3468                                         avg =        4.5
               overall = 0.4224                                         max =         13
          
                                                          F(30,2123)        =      44.75
          corr(u_i, Xb)  = 0.2572                         Prob > F          =     0.0000
          
          ----------------------------------------------------------------------------------
            ln_outcome_var |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
                 1.treated |  -.3865719   .2350597    -1.64   0.100    -.8475432    .0743994
                           |
                      year |
                     2008  |   .0919567   .0656177     1.40   0.161    -.0367251    .2206384
                     2009  |   .2666542   .0650532     4.10   0.000     .1390795    .3942288
                     2010  |   .3610557   .0653973     5.52   0.000     .2328062    .4893052
                     2011  |   .4245305   .0666931     6.37   0.000     .2937397    .5553212
                     2012  |   .4280092   .0665343     6.43   0.000       .29753    .5584885
                     2013  |   .5445454    .067545     8.06   0.000     .4120842    .6770066
                     2014  |   .4882169   .0672031     7.26   0.000     .3564262    .6200076
                     2015  |   .2675295   .0693862     3.86   0.000     .1314574    .4036016
                     2016  |   .1865509   .0724757     2.57   0.010     .0444202    .3286816
                     2017  |  -.0905795   .0742565    -1.22   0.223    -.2362026    .0550437
                     2018  |   -.873203   .0817364   -10.68   0.000    -1.033495   -.7129112
                     2019  |  -2.578366   .1230701   -20.95   0.000    -2.819717   -2.337015
                           |
              treated#year |
                   1 2008  |   .2694578   .2848692     0.95   0.344    -.2891941    .8281097
                   1 2009  |    .474084   .2699166     1.76   0.079    -.0552446    1.003413
                   1 2010  |   .4436675   .2662062     1.67   0.096    -.0783848    .9657198
                   1 2011  |   .4161684   .2705918     1.54   0.124    -.1144843    .9468212
                   1 2012  |   .4582066   .2786386     1.64   0.100    -.0882266     1.00464
                   1 2013  |    .408836   .2756113     1.48   0.138    -.1316604    .9493324
                   1 2014  |   .4136756   .2830883     1.46   0.144    -.1414839     .968835
                   1 2015  |   .5162193   .2877601     1.79   0.073    -.0481018     1.08054
                   1 2016  |   .5568508   .2872122     1.94   0.053    -.0063959    1.120097
                   1 2017  |   .3029237    .299937     1.01   0.313    -.2852774    .8911248
                   1 2018  |   .2065058   .3338022     0.62   0.536    -.4481077    .8611193
                   1 2019  |   .2301391   .4312924     0.53   0.594    -.6156608    1.075939
                           |
          1.active_treated |  -.2409407    .094579    -2.55   0.011    -.4264179   -.0554636
               ln_control1 |   .2115592   .0337668     6.27   0.000     .1453398    .2777785
               ln_control2 |   .0605951   .0316874     1.91   0.056    -.0015465    .1227366
               ln_control3 |   .0566132   .0151514     3.74   0.000     .0269001    .0863263
                  control4 |  -4.04e-06   1.88e-06    -2.16   0.031    -7.72e-06   -3.64e-07
                     _cons |   1.258207   .1274147     9.87   0.000     1.008337    1.508078
          -----------------+----------------------------------------------------------------
                   sigma_u |  .91442113
                   sigma_e |  .60021835
                       rho |  .69888531   (fraction of variance due to u_i)
          ----------------------------------------------------------------------------------
          F test that all u_i=0: F(612, 2123) = 8.97                   Prob > F = 0.0000
          
          . 
          . testparm i.treated#i.year // Clyde #3 - treated dummy for each year
          
           ( 1)  1.treated#2008.year = 0
           ( 2)  1.treated#2009.year = 0
           ( 3)  1.treated#2010.year = 0
           ( 4)  1.treated#2011.year = 0
           ( 5)  1.treated#2012.year = 0
           ( 6)  1.treated#2013.year = 0
           ( 7)  1.treated#2014.year = 0
           ( 8)  1.treated#2015.year = 0
           ( 9)  1.treated#2016.year = 0
           (10)  1.treated#2017.year = 0
           (11)  1.treated#2018.year = 0
           (12)  1.treated#2019.year = 0
          
                 F( 12,  2123) =    0.62
                      Prob > F =    0.8296
          If they are, that would suggest that time trends, after appropriate adjustments and accounting for the effect of the treatment itself, are not the same in the treated and untreated firms.
          With the result from -testparm- not being statistically significant, can I therefore appropriately state that the treated and untreated firms are not prone to different time trends? Which, in turn, makes it reasonable to say that the parallel trend assumption (most likely) holds for this model?


          Another thing I asked myself at some point, is whether/why I should or should not make use of vce(cl firmid) as an option in -xtreg-, or would this merely create downward biases standard errors in this case?

          Sorry for getting back a little later to you - it seemed like the Stata forum experienced some issues (at least, for me) with its web certificate that had expired..

          Again, much thanks for the quick and easy to grasp explanations you have given me so far. Your help has been of great use to make progress again and, more importantly, made me find the motivation (and even joy!) which I highly needed to continue my research!

          Best,
          Jan

          Comment


          • #6
            With the result from -testparm- not being statistically significant, can I therefore appropriately state that the treated and untreated firms are not prone to different time trends? Which, in turn, makes it reasonable to say that the parallel trend assumption (most likely) holds for this model?
            Sort of. It would be more accurate to say that the data are compatible with parallel trends. What you really have established is the absence of evidence in the data for violations of parallel trends. While I suspect your sample size is large enough that this also corresponds to affirming the parallel trends assumption, without a formal power analysis you are better off confining your self to saying the data are compatible with parallel trends.

            Another thing I asked myself at some point, is whether/why I should or should not make use of vce(cl firmid) as an option in -xtreg-, or would this merely create downward biases standard errors in this case?
            Your sample size (number of clusters) is large enough for cluster robust standard errors to be valid. It is my understanding that in economics (which seems to be the domain of your project) the use of cluster robust standard errors is strongly preferred where, as here, they are permissible.

            Comment


            • #7
              Thanks a lot once again, Clyde. The reasoning is more clear to me than it has ever been and I will take it into account in my report's methodology.

              Now that I have applied this method for estimating the treatment effects in my panel data, I have one last thing that has remained unclear to me after a few attempts to investigate it. It is most likely - and hopefully - the last issue where I could use some advice of yours.


              After having set up different equations for various dependent variables I study, I want to make use of a split sample or another dummy variable (whichever is the correct method) to conduct additional analysis.

              For example, in the equation I posted - the one below - I aim to study the following:

              Code:
               
               xtreg ln_outcome_var i.active_treated i.year ln_control1 ln_control2 ln_control3 control4, fe vce(cl firmid)
              For the active_treated (i.e. firmid's that are part of the treated and in the post-treatment period) I want to observe the difference in treatment effect within that group, based on another variable's (varname) median.

              sum varname, de
              local p50_varname = r(p50)
              gen Above_median_dummy = 0
              replace Above_median_dummy = 1 if varname >= `p50_varname'
              Herewith, I generate a new dummy variable (let's call it median_dummy) that takes on the value 1 for those observations where varname is equal to or higher than the median and 0 for those that are below the median.
              Based on this median, I want to see the difference in the treatment effect i.active_treated on ln_outcome_var by making use of this newly generated dummy.

              What should my regression equation look like in order to observe the difference in treatment effect for the treated, based on whether they find themselves above or below the median of varname?


              Code:
               
               xtreg ln_outcome_var i.active_treated##i.Above_median_dummy i.year ln_control1 ln_control2 ln_control3 control4, fe vce(cl firmid)

              Would something like this be a correct approach? Or would it suffice/be better to construct two separate equations where I limit one by
              if varname >= `p50_varname'
              and the other by
              Code:
               if varname < `p50_varname'
              I suppose that restricting it to only one equation rather than the latter method would be more representative, but I am not certain whether it gives me the results I need to determine this difference in treatment based on being above or below the median.. In case I'm completely off, I would like to hear so as well!

              Thanks a lot in advance, Clyde. For now, this is the last and only uncertainty I am facing that I could use your advice for!

              Comment


              • #8
                First a comment about your variable for the median split. If varname has missing values, you will get wrong results because in Stata a missing value is treated as greater than any number. So varname >= `p50_varname' will be true in any observation where varname has a missing value. So you need to change the last line of that code to:
                Code:
                replace Above_median_dummy = 1 if varname >= `p50_varname' & !missing(varname)
                unless you are certain that varname never has missing values. Actually, that whole -gen- -replace- routine is lengthier than it needs to be. Creating indicator variables for conditions can always be done as a one-liner:

                Code:
                gen median_dummy = varname >= `p50_varname' if !missing(varname)
                As for the two regressions vs one regression with interaction, both approaches are valid. They do not give exactly identical answers, however. Another consideration is that the method with interaction always enables you to estimate not only the difference in treatment effect in the two groups, but also to estimate its standard error (and therefore calculate a confidence interval for it). The method of two separate equations does that only if the regression itself is supported by the -suest- command. But -xtreg, fe- is not supported by -suest-, so you would be stuck at that point. Another difference between the methods is that the interaction approach draws on the sample size of the full sample, whereas the two separate regressions are each done on half the sample, and are therefore less powered. For that reason, I usually prefer the interaction approach.

                One other thing to consider when going this route is what variables to interact with the above_median_dummy. If you are interacting only with the active treatment variable then you are, in effect, stipulating that the effects of all other model variables, i.e. all the years and the three lncontrol variables, are the same for both groups. This is often a reasonable assumption, and when that is the case, the simple regression you show for the interaction analysis is fine. However, if you have reason to think that some of the other variables' effects might also differ between these two groups, then you need to interact the median split indicator with those variables as well. Sometimes that means you end up interacting the median split indicator with all of the predictors in the model. (By the way, if you do interact the median split variable with all of the model predictors, then the effect estimates you get from this method will be identical to those you would get from a two regressions approach. (But the standard errors will still be different.)

                Comment


                • #9
                  One should also consider that the DiD design with more 2 groups and 2 periods, staggered treatment timing and/or non-monotonous treatment regime (i.e. units getting out of treatment in a consequential way), mean that a weighted average of treatment effects is identified - not necessarily a useful one, and unless adjustments are made, some effects might be given a weights with important consequences. See Hull (2018), de Chaisemartin and D'Haultfœuille (2020), Callaway and Sant'Anna (2020), Imai and Kim (2020), Athey and Imbens (2021), Goodman-Bacon (2021).



                  I'm using StataNow/MP 18.5

                  Comment


                  • #10
                    Hi Matteo,

                    Thanks for your additional heads-up, definitely an important point I have to take into consideration!

                    Originally posted by Matteo Pinna Pintor View Post
                    mean that a weighted average of treatment effects is identified - not necessarily a useful one,
                    I think a weighted average would to some extent suit the aim of my study, but I am now trying to construct an approach where the treatment effect can be balanced out instead of the weighted average I now obtain as a result.

                    I now consider using the command - bacondecomp - and I'm currently trying to make it work. Unfortunately, I'm a bit stuck with this error that Stata gives me:

                    Code:
                    .
                    . xtset firmid year
                           panel variable:  firmid (strongly balanced)
                            time variable:  year, 2009 to 2020
                                    delta:  1 year
                    
                    .
                    . bacondecomp outcome_var treated_dummy control_var1 control_var1 control_var2  control_var3
                    
                    Treatment variable active_treated does not weakly increase (0-&gt;1) over time periods
                    something that should be true of your data is not
                    r(459);
                    Are you by any chance familiar with this command and what this error specifically means? I have tried my best looking for explanations/solutions online for some time now, but have unfortunately not found something useful yet.. I consulted the actual SSC package as well where I found the following part that seemed relevant:

                    Code:
                     tempvar negt first last jump  qui {   gen `negt'=-`t'   bys `touse' `i' (`negt'): g `first'=`tr'[_N]   bys `touse' `i' (`t'): g `last'=`tr'[_N]   bys `touse' `i' (`t'): g `jump'=`tr'-`tr'[_n-1] if `touse'   }  su `last', mean  if (r(max)!=1) | !inlist(r(min),0,1) {   di as err "Treatment variable `tr' does not have a maximum of one in last period"   di as err "or has a minimum not either one or zero in last period"   err 459   }  su `first', mean  if (r(min)!=0) | !inlist(r(max),0,1) {   di as err "Treatment variable `tr' does not have a minimum of zero in first period"   di as err "or has a maximum not either one or zero in first period"   err 459   }  su `jump', mean  if (r(min)!=0) | (r(max)!=1) {   di as err "Treatment variable `tr' does not weakly increase (0->1) over time periods"   err 459   }
                    http://fmwww.bc.edu/repec/bocode/b/bacondecomp.ado

                    Sorry, I could not figure out how to post the part of the code without having to scroll to the right - have been trying to fix it for 15 minutes now.. in case it remains unclear to read it, I kindly ask you to see it in the link above

                    Again my appreciation for your input, Matteo. Definitely something that I want to add to my current methodology!

                    Comment


                    • #11
                      Sorry, i noticed a typo. In #9 i meant to write "some effects might be given negative weights with important consequences".

                      Never tried that command. Never had time to learn all this recent stuff since I first noted it last year. So can't be much more helpful than noticing it exists. But I can suggest that, if this is new for you, there's no need to rush. Do your previous stuff, then in parallel sit down, read carefully for a month, and then see if you need to change the analysis. Good luck!

                      [edit]

                      the error message on that command likely refers to an important aspect of this new body of theory - the properties of the "treatment histories" of each unit. In that command, apparently, it is assumed that if a unit enters into treatment, then it does not go out, so that overall, treatment exposure in the panel can only increase or remain the same (i.e. weakly increase). Some of those papers restrict their analysis and derivations to these instances, others (Hull, if I remember correctly) are more general. For example, and in addition, Imai et al. (2020) - but using matching.
                      Last edited by Matteo Pinna Pintor; 22 Jul 2021, 11:16.
                      I'm using StataNow/MP 18.5

                      Comment

                      Working...
                      X