Diff-in-diff approach when implementation of policy is "staggered"

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#31

15 Apr 2021, 12:52

Interesting paper. Thank you for calling my attention to it.

You will not face this problem if you are using matched pairs, because in the matched pair design, only untreated entities ever serve as controls in the analysis. The important things, however, is that your analysis must reflect the matched pairing as a level in the model. Whereas in the ordinary two-way fixed effects analysis you have repeated observations nested within firms (or jurisdictions, or whatever the entities are) , you now have a three way model where those entities are now nested in matched-pairs. This means that you cannot use a fixed-effects estimator, you must use a random effects estimator to capture this three-level structure. You will have to consider whether the problems associated with using a random-effects estimator are a reasonable trade-off in your circumstances compared to the problems of not having matched pairs and coping with the problems of the generalized DID estimator pointed out in the paper you cite.
Comment

Dominik Mueller

Join Date: Apr 2021
Posts: 8

#32

16 Apr 2021, 08:24

You're welcome!

So I have a follow up question then. In my dataset I investigate the effects of the introduction of the "TPD" (a regulation - the treatment) on the spread of the companies. However, the treatment dates vary over time for the companies. Even though all companies are treated, not all are affected by it (they already introduced the requirements, hence its just a "formality") - hence, I have a control group by them.

This is an example of my dataset; I do have unbalanced firm-year panel data:

Code:

* Example generated by -dataex-. For more info, type help dataex
* SEMI represents an indicator variables equals one when the firm is treated. SICCODE represents and industry code in which the company is active; MVEEuro represents the size of the firm; spread is the outcome variable I will investigate, TPDImplDate represents the year after which the treatment is active and POST is a dummy varibale equals 1 when the treatment is active.
clear
input str8 ID int year float SEMI int SICCODE float(MVEEuro spread RTRNVLT SHRTRNVER TPDImplDate POST)
"id_00014" 2005 0 3569    68932752   .001554353   .0166777    .02567348 17282 0
"id_00014" 2006 0 3569   120267000  -.009670787 .028024163   .027277956 17282 0
"id_00014" 2008 0 3569   905433088   -.04200022  .04257916   .009359553 17282 1
"id_00014" 2009 0 3569   856998016  .0006897881  .02905001   .005881219 17282 1
"id_00014" 2010 0 3569   779019776  .0025383455  .01767642  .0044230768 17282 1
"id_00014" 2011 0 3569  1286756864  .0019972555  .02444741   .005446565 17282 1
"id_00014" 2012 0 3554  1715971968   .002439975  .01697833  .0021662957 17282 1
"id_00014" 2013 0 3554  3727854080  .0016876453 .017834961  .0015230265 17282 1
"id_00014" 2014 0 3554  4747133952   .001586811 .014218923   .001466494 17282 1
"id_00036" 2008 1 2869   128974328    .02985849  .04066654 .00022902184 17186 1
"id_00036" 2009 1 2869    43831540   .030995583 .035337657 .00015840407 17186 1
"id_00036" 2010 1 2869    40447140   .025922013  .02504252  .0004027027 17186 1
"id_00036" 2011 1 2869    55772208    .02712461 .029361796  .0003617544 17186 1
"id_00036" 2012 1 1629    61363420   .027664075  .02827287 .00026909183 17186 1
"id_00036" 2013 1 1629    40980892   .025084613 .029163113  .0005518307 17186 1
"id_00036" 2014 1 1629    27790522    .02749496 .033477318  .0009447147 17186 1
"id_00049" 2005 0 7999   136692656   -.04278145 .033896316   .012041847 17282 0
"id_00049" 2006 0 7999  1098928000   -.02891513  .05632991   .010202055 17282 0
"id_00049" 2008 0 7999   788159424   -.03833329  .03494931    .00310709 17282 1
"id_00049" 2009 0 7999   602287680  .0020673934  .02719652   .003337772 17282 1
"id_00050" 2005 0 3559   287406464  .0024382854 .018030537  .0019835255 17282 0
"id_00050" 2006 0 3559   417544000  -.006886094 .023459373  .0016332874 17282 0
"id_00050" 2008 0 3589   774391936  -.022483956 .034297273  .0009556353 17282 1
"id_00050" 2009 0 3589   419015328   .011857878  .03212929  .0008789165 17282 1
"id_00050" 2010 0 3589   274785728   .008920171 .016108448  .0006744709 17282 1
"id_00050" 2011 0 3589   351631808   .012407872  .02273858  .0006320611 17282 1
"id_00050" 2012 0 3589   309290240   .012939607 .020770475  .0006611808 17282 1
"id_00050" 2013 0 3589   255654144    .01475777 .016248103 .00021307614 17282 1
"id_00050" 2014 0 3589   259542096   .009545056 .009727353  .0002012765 17282 1
"id_00056" 2006 0 3589   163034768  .0016636546  .02594535  .0015226384 17282 0
"id_00056" 2008 0 3589   242727008   -.00638822  .04795364  .0013905322 17282 1
"id_00056" 2009 0 3589   151699680   .015587342  .05352763    .00599081 17282 1
"id_00061" 2005 0 2656   316467744   .013896713  .02041272    .00033008 17282 0
"id_00061" 2006 0 2656   495812608   .015240978 .015833495 .00020154467 17282 0
"id_00061" 2008 0 2656   832140032   .009273218   .0385232 .00014480027 17282 1
"id_00061" 2009 0 2656   695773952    .03149717 .029091856 .00020981515 17282 1
"id_00080" 2005 0 3264     2667080   .007987809 .022149164    .00651144 17282 0
"id_00080" 2006 0 3264     9174712    .01262551  .02529957   .004832503 17282 0
"id_00080" 2008 0 3264   167708032   .031091897  .03830487  .0003276214 17282 1
"id_00080" 2009 0 5074   112600144   .035769444   .0267833  .0003759072 17282 1
"id_00080" 2010 0 5074    49825652   .021041086  .01959802  .0004239534 17282 1
"id_00080" 2011 0 5074    78369064   .024742244  .02314992 .00033269785 17282 1
"id_00080" 2012 0 5074    98782288    .02418742 .019188194  .0002570736 17282 1
"id_00080" 2013 0 5074    83640504   .028068194 .014612177  .0002152888 17282 1
"id_00080" 2014 0 5074    83103912   .030374026 .017278254 .00015980937 17282 1
"id_00093" 2006 0 3089   206572176    .02622132  .02114137 .00005993211 17282 0
"id_00093" 2008 0 3089   644188672    .03762868 .035819687 .00005181009 17282 1
"id_00093" 2009 0 3086   561651392  -.008512885  .12705249 .00006356939 17282 1
"id_00093" 2010 0 3086   2.807e+08   -.02609855  .03601439 .00003674825 17282 1
"id_00093" 2011 0 3086   414585376    .01348773  .03186177 .00003856165 17282 1
"id_00093" 2012 0 3465   630120704   .018667286 .036539003 .00004089285 17282 1
"id_00093" 2013 0 3465   379989952            0   .0497588 .00003221143 17282 1
"id_00103" 2006 0 2836   226461408  -.005637278  .02024567   .003111616 17282 0
"id_00103" 2008 0 2836   942407104  -.031372428 .036509994   .003900399 17282 1
"id_00103" 2009 0 2836  1211408128  .0009203761  .02439049   .003491862 17282 1
"id_00103" 2010 0 2836  1154392448   .003678932 .035140697  .0046487395 17282 1
"id_00103" 2011 0 2836   885623872   .007982162  .04298878   .005047633 17282 1
"id_00103" 2012 0 2836   233058448    .01120722 .027628934   .002271331 17282 1
"id_00119" 2005 0 2823    84648576     .0143054 .017361403  .0010554838 17282 0
"id_00119" 2006 0 2823    99213976   .008339566 .016545933  .0018694928 17282 0
"id_00119" 2008 0 2823   183488816   -.01823538 .019353237  .0007401588 17282 1
"id_00119" 2009 0 2823   157040960   .016912408  .02204966   .001156575 17282 1
"id_00119" 2010 0 2823    97504792   .013302117 .018230963  .0010388609 17282 1
"id_00119" 2011 0 2823   199760464   .007266851  .02252157  .0008127256 17282 1
"id_00119" 2012 0 2823  2131527808   .004774294  .01778004  .0007644884 17282 1
"id_00119" 2013 0 2823  1899249664    .00302965 .013836432  .0009152694 17282 1
"id_00119" 2014 0 2823  1579382272   .003592132 .017851545  .0010071746 17282 1
"id_00122" 2005 0 2281    46614380    .02826007  .01559195  .0002108262 17282 0
"id_00122" 2006 0 2281    44206732    .04744977 .032993216 .00025382263 17282 0
"id_00122" 2008 0 2281    53825380    .15598068  .04742756 .00020567376 17282 1
"id_00122" 2009 0 2281    47936356    .06116773   .0402959  .0004391534 17282 1
"id_00122" 2010 0 2281    31968438            0  .02812117  .0005402299 17282 1
"id_00122" 2011 0 2281    73098832            0  .02551264 .00031526104 17282 1
"id_00122" 2012 0 2281   127293344   .015010345 .025070975 .00015384615 17282 1
"id_00122" 2013 0 2281   146744768            0  .02129554 .00004761905 17282 1
"id_00122" 2014 0 2281   155560176     .0873577 .015909867  .0001226415 17282 1
"id_00126" 2008 1 3541     5006366    .27333698  .04934133 .00042661195 17282 1
"id_00126" 2009 1 3541     6616169    .07087873  .04777957   .000571066 17282 1
"id_00126" 2011 1 3541     6802463            0  .03388054 .00013467316 17282 1
"id_00126" 2012 1 3541     6520094    .13768259  .04045403 .00024619288 17282 1
"id_00126" 2013 1 3541     6104283     .3184079   .0556478 .00022736887 17282 1
"id_00126" 2014 1 3541     9131440    .13751495  .05543078 .00006140494 17282 1
"id_00128" 2005 0 2631   642412672   .003329287 .014274475   .002408165 17282 0
"id_00128" 2006 0 2631   717536768  -.008048323 .013988613  .0026716124 17282 0
"id_00128" 2008 0 2631  1332791040  -.023042407 .020184103  .0011856727 17282 1
"id_00128" 2009 0 2631  1304849664    .00218392  .01935763  .0008529326 17282 1
"id_00128" 2010 0 2631  1332988416  .0046279076 .013125693  .0005848886 17282 1
"id_00128" 2011 0 2631  1551837824   .004790683  .01356936  .0006338508 17282 1
"id_00128" 2012 0 2631  1505666816   .007632469  .01343274  .0003173279 17282 1
"id_00128" 2013 0 2657  1495406848  .0032238446 .012256898  .0003628629 17282 1
"id_00128" 2014 0 2657  1684874368  .0018146912  .01074967  .0004430364 17282 1
"id_00143" 2005 0 2911   458438016   .002331408 .019748045    .01115539 17282 0
"id_00143" 2006 0 2911  6756505600 -.0001751939  .02406472   .003328213 17282 0
"id_00143" 2008 0 2911 13927083008   -.03255114  .03742211   .002696676 17282 1
"id_00143" 2009 0 2911 11994588160   .001617242 .029150063   .002207031 17282 1
"id_00143" 2010 0 2911  7723643392  .0017768323 .016915994    .00188681 17282 1
"id_00143" 2011 0 2911  8207965696  .0018787972  .02037382  .0016668555 17282 1
"id_00143" 2012 0 2911  8792957952  .0022448776 .015769344  .0009531767 17282 1
"id_00143" 2013 0 2899  8492946944  .0016732815 .015833834  .0008618964 17282 1
"id_00143" 2014 0 2899 11272925184   .001505706 .013577214   .001131736 17282 1
end
format %td TPDImplDate

In the above example control firms as well as treated firms are listed. I matched them by year, industry(SICCODE) and clostest in size with the following code:

Code:

preserve

 // only controls

 keep if SEMI == 0
 ds year SICCODE, not
 rename (`r(varlist)') semi0_=
 tempfile semi0
  save `semi0'


 restore
 // only treatments
 
 keep if SEMI == 1
 ds   year SICCODE, not
 rename (`r(varlist)') semi1_=
 tempfile semi1
 save `semi1'

 joinby year SICCODE using `semi0'
 gen size_diff = abs(semi1_MVEEuro - semi0_MVEEuro)
 by semi1_ID year (size_diff), sort: keep if _n == 1

That works until here, but I have no idea how I can implement the diff in diff analysis from this point on based on the matched sample.

I thought about generating a dataset for the controls (SEMI==0) & one for the treatments (SEMI ==1 ) and then append those two datasets and doing something like this:

Code:

xtset ID year
xtreg spread i.SEMI##i.POST MVEEuro SHRTRNVER RTRNVLT i.year, fe

However, I'm not sure whether this is the right way to handle a matched sample, as I do not control for the matched pairs as you suggested. Could you please help me with the code for the diff in diff for the matched sample?

Thank you in advance!
Kind regards

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#33

18 Apr 2021, 12:40

It will make life a little simpler if instead of naming your variables semi0_* and semi1_* when you create the matched pairs, name them *0 and *1 respectively. Also, the variable SEMI itself should be dropped in the semi0 and semi1 tempfiles.

Anyway, you need then to go back to a long layout, with a variable marking the pairs. Then, you have three level data, so you need a multi-level model:

Code:

preserve // only controls keep if SEMI == 0 drop SEMI ds year SICCODE, not rename (`r(varlist)') =0 tempfile semi0 save `semi0' restore // only treatments keep if SEMI == 1 drop SEMI ds year SICCODE, not rename (`r(varlist)') =1 tempfile semi1 save `semi1' joinby year SICCODE using `semi0' gen size_diff = abs(MVEEuro1 - MVEEuro0) by ID1 year (size_diff), sort: keep if _n == 1 gen long pair_num = _n reshape long MVEEuro spread RTRNVLT SHRTRNVER TPDImpIDate POST ID, /// i(pair_num) j(SEMI) mixed spread i.SEMI##i.POST MVEEuro SHRTRNVER RTRNVLT i.year || pair_num: || ID:

Note: This code could not be run and tested on your example data, because the example does not contain any SEMI = 0 observations that agree with any SEMI - 1 observation on year and SICCODE. Consequently the result of the -joinby- command is an empty data set and nothing works from that point on. So beware of typos or other errors here, but this is the gist of how to proceed.
Comment

Dominik Mueller

Join Date: Apr 2021
Posts: 8

#34

19 Apr 2021, 01:34

I implemented your code and it works until the mulit-level "mixed" model. Before I used the "mixed" command I came up with this data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str8 ID long pair_num byte SEMI int year float(MVEEuro SHRTRNVER RTRNVLT POST spread)
"id_01330"  1 0 2008  1240832000   .006198581  .03431406 1   .003143964
"id_00036"  1 1 2008   128974328 .00022902184  .04066654 1    .02985849
"id_01330"  2 0 2009   639045184  .0039388845  .02955024 1   .003727492
"id_00036"  2 1 2009    43831540 .00015840407 .035337657 1   .030995583
"id_01330"  3 0 2010   627811648  .0033015674  .02269542 1  .0028965354
"id_00036"  3 1 2010    40447140  .0004027027  .02504252 1   .025922013
"id_01330"  4 0 2011  1450855040   .002773742  .02731557 1  .0015746313
"id_00036"  4 1 2011    55772208  .0003617544 .029361796 1    .02712461
"id_06707"  5 0 2012 18830837760    .00325221  .01934889 1   .004799885
"id_00036"  5 1 2012    61363420 .00026909183  .02827287 1   .027664075
"id_06104"  6 0 2013   503291520   .002419468 .016241016 1  .0038217856
"id_00036"  6 1 2013    40980892  .0005518307 .029163113 1   .025084613
"id_06104"  7 0 2014   584689728   .003368442  .01991038 1   .003782598
"id_00036"  7 1 2014    27790522  .0009447147 .033477318 1    .02749496
"id_03965"  8 0 2008   624483520  .0010056949  .04921107 1   .008454365
"id_00126"  8 1 2008     5006366 .00042661195  .04934133 1    .27333698
"id_03965"  9 0 2009   588252736  .0005279564  .03738844 1   .009212082
"id_00126"  9 1 2009     6616169   .000571066  .04777957 1    .07087873
"id_03965" 10 0 2011   482211104 .00030009515 .031054256 1    .00821246
"id_00126" 10 1 2011     6802463 .00013467316  .03388054 1            0
"id_07939" 11 0 2012    21301248 .00057254423  .03552626 1    .10261687
"id_00126" 11 1 2012     6520094 .00024619288  .04045403 1    .13768259
"id_07939" 12 0 2013    10346570  .0046255635  .03969612 1    .06816037
"id_00126" 12 1 2013     6104283 .00022736887   .0556478 1     .3184079
"id_07939" 13 0 2014    11748119   .003664194 .029712977 1    .03326927
"id_00126" 13 1 2014     9131440 .00006140494  .05543078 1    .13751495
"id_04219" 14 0 2008   216393744 .00024268353 .032082506 1   .027414085
"id_00155" 14 1 2008   227073856  .0019657144  .03906628 1   -.02260862
"id_04219" 15 0 2009   147710064  .0002294315  .04429052 1     .0611857
"id_00155" 15 1 2009   182528000   .002044335  .04613097 1   .017565839
"id_04846" 16 0 2010    43129340 .00009518518 .022353055 1    .02757953
"id_00155" 16 1 2010    49585432   .001348215 .035168096 1   .015656551
"id_04846" 17 0 2011    69164304  .0001587879  .02711308 1   .031334464
"id_00155" 17 1 2011    73015680   .002997577 .031412493 1    .01102845
"id_08620" 18 0 2012   161620160    .00519308  .02596362 1   .004938239
"id_00155" 18 1 2012   146000400  .0015806516 .018618498 1   .009517466
"id_06399" 19 0 2013   111358352   .015720367 .019201174 1  .0031410526
"id_00155" 19 1 2013   137432176  .0010915267 .015002783 1    .00932133
"id_06532" 20 0 2014   137697632  .0015128078  .01907831 1   .008035841
"id_00155" 20 1 2014   144852400  .0010103871 .018729305 1    .00951512
"id_05651" 21 0 2009    12306118  .0006635195  .03923098 1    .09791174
"id_00160" 21 1 2009    22926866  .0002109589  .04591163 1    .05252363
"id_05651" 22 0 2010    13923182 .00004386677 .036968723 1    .11399193
"id_00160" 22 1 2010     9793679 .00026575342  .02335245 1  .0031518966
"id_05651" 23 0 2011    16503891  .0001545526  .05267449 1    .09802325
"id_00160" 23 1 2011    12217241  .0001920635  .03078736 1            0
"id_07590" 24 0 2005   139030192   .010753858  .01868957 0   .004484589
"id_00225" 24 1 2005     6188281    .02219392  .01831214 0   .011163672
"id_07590" 25 0 2006   210779968     .0228293 .028791176 0  .0026473426
"id_00225" 25 1 2006    11306774    .03462114 .018883863 0    .00947767
"id_06707" 26 0 2007  9944017920   .007179112 .018063923 0 -.0006449268
"id_00225" 26 1 2007    23672744   .030812753 .021674056 0   .007041991
"id_07947" 27 0 2009   916536256   .012682172 .027097477 1  .0017636938
"id_00225" 27 1 2009   735041344  .0018287174 .030712824 1   .006715424
"id_07590" 28 0 2010   322135008   .026162695 .026408574 1   .001252184
"id_00225" 28 1 2010   393329824  .0013284342 .020403063 1   .004842348
"id_07590" 29 0 2011   378651360   .029643806  .02710081 1  .0016570878
"id_00225" 29 1 2011   531849184   .001168623  .02314209 1    .00471527
"id_07590" 30 0 2012   379749824    .02403754   .0330931 1   .020081304
"id_00225" 30 1 2012   624386176  .0008909604 .017101506 1   .005221949
"id_01359" 31 0 2013   438671936   .011123857 .020225925 1  .0012321492
"id_00225" 31 1 2013   545230656  .0011104521 .016631598 1   .004958729
"id_07590" 32 0 2014   182421648   .007943503  .03941109 1   .021559386
"id_00225" 32 1 2014   647177664  .0007302606  .01678111 1  .0041859904
"id_04003" 33 0 2006     8275613  .0020290525 .024034357 0    .02308953
"id_00298" 33 1 2006    14712883  .0007636364  .02040359 0    .01587711
"id_04220" 34 0 2009    50522512 .00023458006  .02087883 1   .026805406
"id_00298" 34 1 2009    34657328 .00027333334  .02741971 1    .04790221
"id_04003" 35 0 2010    12546232  .0021524597   .0227295 1    .01332592
"id_00298" 35 1 2010    18904712  .0006562162  .02272107 1    .03079886
"id_04003" 36 0 2011    30414930  .0010699616 .027067004 1   .010569789
"id_00298" 36 1 2011    24203736  .0003417103 .021451317 1   .029891124
"id_05418" 37 0 2009    60783636   .000738411  .02600346 1   .016921964
"id_00318" 37 1 2009    12075436   .001256062   .0473181 1     .0301326
"id_05418" 38 0 2010    25476138  .0002501067  .02910354 1   .025622847
"id_00318" 38 1 2010     5859451  .0010069364 .024271064 1   .032701954
"id_05418" 39 0 2011    16563455  .0002353975  .04274403 1   .034131546
"id_00318" 39 1 2011     5980634  .0011802778 .027919523 1   .027001036
"id_01377" 40 0 2009    92080704   .002080811   .0231513 1   .012828815
"id_00347" 40 1 2009    33592924   .000887814  .15405644 1     .0984689
"id_00147" 41 0 2010    20411640 .00020388483  .02248791 1  .0005875912
"id_00347" 41 1 2010     5174852 .00034246955  .07965926 1     .2323855
"id_06352" 42 0 2007    33609156    .03843362  .04863128 0  -.003106825
"id_00353" 42 1 2007    33979512 .00019620746   .0446488 0     .0852191
"id_05233" 43 0 2009    30417442    .00015736 .033910368 1    .04748995
"id_00353" 43 1 2009    29339316 .00007498958  .06476927 1    .14043526
"id_05233" 44 0 2010    28265794  .0002111158  .02591096 1    .04201683
"id_00353" 44 1 2010    27709890 .00004775031 .064104885 1     .1634806
"id_05233" 45 0 2011    37459488  .0002740335  .02685072 1    .03157586
"id_00353" 45 1 2011    32856614 .00007221321  .06204874 1     .2103459
"id_06203" 46 0 2012    32986992  .0009591527 .025513565 1   .035930675
"id_00353" 46 1 2012    30022732 .00007023088  .05298381 1    .09817515
"id_08476" 47 0 2013    29665318  .0016736076 .017926183 1   .011521393
"id_00353" 47 1 2013    30319990 .00009217804   .0404327 1     .0868798
"id_01253" 48 0 2014    29185784  .0017903092  .01645547 1   .013341093
"id_00353" 48 1 2014    30689944 .00006833275  .01580923 1    .06574496
"id_05621" 49 0 2009    36656984  .0007434972 .022710556 1    .01615178
"id_00360" 49 1 2009   481247.75   .004702764  .13265552 1    .15103322
"id_03591" 50 0 2012    10425734  .0003842442 .024410035 1   .035392098
"id_00360" 50 1 2012   107012.88   .018206974  .05271334 1    .19541366
end

However, if I run this code:

Code:

mixed spread i.SEMI##i.POST MVEEuro SHRTNVER RTRNVLT i.year || pair_num || ID:

A lot of iterations are done, however, it seems to me as this may come to no conclusion ( i blocked it after 80 iterations as they all looked like Iteration 1. Here is what I got:

Code:

 mixed spread i.SEMI##i.POST MVEEuro SHRTRNVER RTRNVLT i.year || pair_num: ||ID:

Performing EM optimization:

Performing gradient-based optimization:
Iteration 0:   log likelihood =  5732.6143  (not concave)
Iteration 1:   log likelihood =  5749.4949  (not concave)
Iteration 2:   log likelihood =  5749.4949  (not concave)
numerical derivatives are approximate
nearby values are missing
Iteration 3:   log likelihood =  5749.4949  (not concave)
numerical derivatives are approximate
nearby values are missing

and so on...

I have no clue what this is telling me..

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#35

20 Apr 2021, 19:15

OK, you have a non-convergence problem here. In suggesting the three level model I had overlooked the fact that you did your matching with replacement. Since the same control can be matched to multiple cases, that means that id's are no longer nested in pairs (some id's belong to multiple pairs)--and that makes the -mixed- code I wrote incompatible with the data. I'm sorry about that.

There are two ways to proceed. The -mixed- code can be rewritten as a multiple membership model.

Code:

mixed spread i.SEMI##i.POST MVEEuro SHRTRNVER RTRNVLT i.year || _all: R.pair_num || ID:

This will probably converge, although the calculations may be painfully slow if you have a large data set.

But, because of the non-nesting, you also have the option of doing this as a fixed effects model:

Code:

encode ID, gen(id) // BECASE ID IS A STRING VARIABLE AND -xtset- REQUIRES A NUMERIC IDENTIFIER xtset id xtreg spread i.SEMI##i.POST MVEEuro SHRTRNVER RTRNVLT i.year i.pair_num, fe

Some of the pair_num indicators will be omitted due to colinearity (any pair_num which doesn't share a control with any other pair_num), but that is OK here because the information in that case is carried by the fixed effects for that case and control themselves.

Last edited by Clyde Schechter; 20 Apr 2021, 19:18.
Comment
Dominik Mueller

Join Date: Apr 2021

Posts: 8
#36

22 Apr 2021, 01:15

Thanks a lot , it works!
For the diff in diff estimator SEMIPOST i get a coefficient of 0,0140671 which is significant (P>|t| - 0.0000)
However, in many studies that do similar approaches it says that they use the natural logarithm of the spread, MVEEuro, SHRTRNVE and RTRNVLT. If I follow them and use the natural logarithm as well, the diff in diff estimator changes its sign from plus to minus i.e. SEMIPOST coefficient is now -0.1244301 which is significant ( P>|t| - 0.0000). I wonder which coefficient I should trust and how I should interpret them? I would interpret them as follows :for the first one I would say that an increase in reporting frequency will increase the spread whereas the second one would tell me that an increase in reporting frequency will reduce the spread?
I also read somethind about that the ln is used to control for outliers? May that be the point why the sign changes from positive to negative?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#37

22 Apr 2021, 12:08

The use of log transformation as a way of dealing with outsiders is a misguided statistical practice. Variables should be log transformed for regression analysis when, and only when, the relationships in question are log-linear rather than linear. Now, if you have two variables, one of which has a fairly narrow range and another which extends over many orders of magnitude, it is more likely that the relationship will be log-linear rather than linear. But there is no guarantee of that, and it certainly should not be relied on routinely. You need to look at scatterplots of your predictors against the spread variable, both as is and log-transformed, to figure out which relationships are linear and which become linear after log transformation. (Ideally, you should have done this even before running any regression commands.) You may also find that some of the variables should be log transformed and others not.
Comment
Dominik Mueller

Join Date: Apr 2021

Posts: 8
#38

23 Apr 2021, 02:13

Thank you!
I did the scatterplots for the control variables SHRTVNER, MVEEuro & RTRNVLT and come to the conclusion that it is best to use the logarithm of the spread as well as the logarithm of the variables.
I implemented it as follows for each control variable:

Code:

gen logspread = ln(spread) gen logSHRTRNVER = ln(SHRTRNVER) scatter spread SHRTRNVER scatter logspread SHRTRNVER scatter logspread logSHRTRNVER

However, what I'm interested in is the difference-in-differences estimator SEMI#POST - however, this is either zero or one , hence I think a scatterplot does not make any sense then.

Can I rely on the other variables to conclude that I need the logspread as the dependent variable? Or do I have to do anything else?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#39

23 Apr 2021, 23:23

Sorry, I wasn't clearer. Of course SEMI#POST is a 0/1 variable and you will not log-transform that no matter what. My concern is that you distinguish linear vs log-linear relationships between spread and the continuous predictors. So if you have done scatter plots of spread and log-spread against the other continuous variables and their logarithms and the log-log plots look best, then go with that.
Comment
Rafael Acevedo

Join Date: Apr 2019

Posts: 17
#40

06 Nov 2021, 08:47

Originally posted by Clyde Schechter View Post

See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a lucid explanation of generalized difference-in-differences modeling, which applies to your situation.

You need a variable, call it treat, which is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group, another which we can call active_treatment which is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group. Then you do a fixed effects regresion that looks more or less like this:

Code:

xtset region xtreg outcome i.treat i.activetreatment i.year, fe // OR RE AS THE CASE MAY BE

The coefficient of activetreatment is the DID estimator of the effect of treatment.

Hi Clyde, I had a similar question. Mine differs in that I'm trying to measure the treatment effect the year after the treatment. And my individuals (19 countries) can be treated or not in different years (my period of time is 20). For example, Individual A can be treated in years 2, 4, 6, 7, 8, 10; and individual B can be treated in years 3 to 8, 10, 12 to 14, 18, 20; and go on. Then there is not a pattern in the treatment, there is no country fully-untreated, all countries have been treated at any moment, and there is not a year that all countries are treated (in all years I have treated and untreated units). I'm trying to figure it out how would be in my case the best specification I can use... I couldn't apply the treatment variable that you specified in the model suggested to Oskar, because all individuals = 1 (all of them have been treated in different years...) Thank you very much for your help. I did not open a new question because I think mine is related with this post.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#41

06 Nov 2021, 10:48

So, what you have calls for a slightly different approach, generalized difference-in-differences. It's actually fairly simple. You just create a a variable, let's call it under_treatment, that is 1 in any observation where that country is treated in that year, and 0 otherwise. Then the regression is just

Code:

xtset country xtreg outcome i.under_treatment, fe

If appropriate to your situation you can also include i.year in your regression, or add other covariates. The coefficient of 1.under_treatment will be your effect estimate.

Now, you have another complication here in that you say you are trying to measure the effect the year after the treatment. In that case, the regression changes to

Code:

xtset country year xtreg outcome i.L1.under_treatment, fe

Now, that said, I wonder if the modeling may be too simple here. This approach assumes that the effect of starting treatment the second, third, or other, time is the same as the effect of starting it the first time. But in the real world, often those effects are different. You have not explained the full context, and even if you had, unless the problem here is an epidemiologic one, I probably would not be able to advise you about how to build a more realistic model. But you should think seriously whether a simple single effect is realistic in your context, and if it is not, seek advice from the literature or colleagues in your discipline about alternatives.
1 like
Comment
Sanjana Ravi

Join Date: Nov 2021

Posts: 16
#42

23 Dec 2021, 14:34

Originally posted by Clyde Schechter View Post

See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a lucid explanation of generalized difference-in-differences modeling, which applies to your situation.

You need a variable, call it treat, which is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group, another which we can call active_treatment which is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group. Then you do a fixed effects regresion that looks more or less like this:

Code:

xtset region xtreg outcome i.treat i.activetreatment i.year, fe // OR RE AS THE CASE MAY BE

The coefficient of activetreatment is the DID estimator of the effect of treatment.

Hi Clyde - thanks so much for this solution. A follow up question -- could you please explain why the coefficient for "active_treatment" provides the DiD estimator of the effect of treatment? My understanding is that there should be an interaction term in the model, the coefficient of which is the estimator.

Thank you!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#43

23 Dec 2021, 14:41

The variable activetreament is an interaction term--it's just not named that way. If you had a classical DID setup, with distinct treatment and control groups, and every unit in the treatment group adopting the intervention at the same time, then the interaction term treat#pre_post would be identical to the variable activetreatment as defined here. Now, in this more generalized DID situation there is no pre-post variable, and sometimes there aren't even distinct treatment and control groups. But the variable activetreatment is the functional equivalent of an interaction term between the (non-existent) pre-post and (possibly non-existent) treatment variables.

The post you are quoting from in #42 is an old one, and the link in your quote no longer works. But there is a very nice explanation of generalized DID at https://www.annualreviews.org/doi/pd...-040617-013507. I suggest you read it for more details.
1 like
Comment
Ijeoma Ugwuanyi

Join Date: May 2021

Posts: 8
#44

11 Oct 2022, 02:33

Originally posted by Clyde Schechter View Post

See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for a lucid explanation of generalized difference-in-differences modeling, which applies to your situation.

You need a variable, call it treat, which is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group, another which we can call active_treatment which is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group. Then you do a fixed effects regresion that looks more or less like this:

Code:

xtset region xtreg outcome i.treat i.activetreatment i.year, fe // OR RE AS THE CASE MAY BE

The coefficient of activetreatment is the DID estimator of the effect of treatment.

Good day, Clyde,

I am still learning Stata, currently using Stata 14.2.

Thank you very much for the solutions you provide on this platform, such as the one above. I have a slightly different problem.

Here is my design: I want to examine the effect of the death of a company director on innovation for companies that experienced the death of such directors and those that did not experience death. The challenge is that the directors died in different years between my sample range of 2022 to 2020 (panel data). In the example below, the company "AEP" experienced the death of a director in 2008, while the company "AGCO" experienced death in 2002. Other companies in this hypothetical sample did not experience death within the sample range of 2022 to 2020.

As this is just an example, some companies in the main dataset experienced death in 2003, 2004, 2005, 2006, 2008, 2009, 2010, 2011, and 2012.

My questions are:

1. I will like to do a propensity score matching (PSM) and conduct a parallel trend test using a graph, as the number of control firms is much more in addition to other benefits of a PSM. How can I accomplish this with the staggering nature of the treatment?

2. How do I run a DiD effectively with this design?

Other notes:

Ticker = company identifier

fyear = fiscal year (sample year)

Treat = dummy 1 or 0 if a firm experienced death of director

deathyear = the year director died.

My setup for DiD period (post): pre-treatment and post-treatment is 4 years before death, death year, and 4 years after the death

Some steps I have taken

1. Obtained some codes for PSM and prepared a time variable showing -4, -3, -2, -1, 0 1, 2, 3, 4 for the pre and post period. But I realized that this time variable covered only the treatment group. How can I include a counterpart control group for each set of firms with a time range of -4 to 4? This time variable aims to get all firms together within a one-time range to conduct a PSM and a parallel trend test.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str5 Ticker int fyear float Treat int deathyear "AEP" 2002 1 2008 "AEP" 2003 1 2008 "AEP" 2004 1 2008 "AEP" 2005 1 2008 "AEP" 2006 1 2008 "AEP" 2007 1 2008 "AEP" 2008 1 2008 "AEP" 2009 1 2008 "AEP" 2010 1 2008 "AEP" 2011 1 2008 "AEP" 2012 1 2008 "AEP" 2013 1 2008 "AEP" 2014 1 2008 "AEP" 2015 1 2008 "AEP" 2016 1 2008 "AEP" 2017 1 2008 "AEP" 2018 1 2008 "AEP" 2019 1 2008 "AEP" 2020 1 2008 "AES" 2002 0 . "AES" 2003 0 . "AES" 2004 0 . "AES" 2005 0 . "AES" 2006 0 . "AES" 2007 0 . "AES" 2008 0 . "AES" 2009 0 . "AES" 2010 0 . "AES" 2011 0 . "AES" 2012 0 . "AES" 2013 0 . "AES" 2014 0 . "AES" 2015 0 . "AES" 2016 0 . "AES" 2017 0 . "AES" 2018 0 . "AES" 2019 0 . "AES" 2020 0 . "AFG" 2002 0 . "AFG" 2003 0 . "AFG" 2004 0 . "AFG" 2005 0 . "AFG" 2006 0 . "AFG" 2007 0 . "AFG" 2008 0 . "AFG" 2009 0 . "AFG" 2010 0 . "AFG" 2011 0 . "AFG" 2012 0 . "AFG" 2013 0 . "AFG" 2014 0 . "AFG" 2015 0 . "AFG" 2016 0 . "AFG" 2017 0 . "AFG" 2018 0 . "AFG" 2019 0 . "AFG" 2020 0 . "AFL" 2002 0 . "AFL" 2003 0 . "AFL" 2004 0 . "AFL" 2005 0 . "AFL" 2006 0 . "AFL" 2007 0 . "AFL" 2008 0 . "AFL" 2009 0 . "AFL" 2010 0 . "AFL" 2011 0 . "AFL" 2012 0 . "AFL" 2013 0 . "AFL" 2014 0 . "AFL" 2015 0 . "AFL" 2016 0 . "AFL" 2017 0 . "AFL" 2018 0 . "AFL" 2019 0 . "AFL" 2020 0 . "AGCO" 2002 1 2002 "AGCO" 2003 1 2002 "AGCO" 2004 1 2002 "AGCO" 2005 1 2002 "AGCO" 2006 1 2002 "AGCO" 2007 1 2002 "AGCO" 2008 1 2002 "AGCO" 2009 1 2002 "AGCO" 2010 1 2002 "AGCO" 2011 1 2002 "AGCO" 2012 1 2002 "AGCO" 2013 1 2002 "AGCO" 2014 1 2002 "AGCO" 2015 1 2002 "AGCO" 2016 1 2002 "AGCO" 2017 1 2002 "AGCO" 2018 1 2002 "AGCO" 2019 1 2002 "AGCO" 2020 1 2002 "AGO" 2002 0 . "AGO" 2003 0 . "AGO" 2004 0 . "AGO" 2005 0 . "AGO" 2006 0 . end
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#45

12 Oct 2022, 11:10

Well, if you were going to pursue this just from a (generalized) DID perspective, you would create a variable that is 1 during the post-death period for those firms that experienced one, and 0 in all other observations (including both pre-death and all years for firms not-experiencing a death.) Let's call that variable experienced. Then you would set up a model like:

Code:

xtset firmid xtreg outcome i.experienced i.year, fe

You have, however, properly recognized that this approach might fare poorly, and might fail the parallel trends test. So you're thinking about propensity score matching. I don't think I would go that route, however. First, propensity score matching is difficult in longitudinal data because the variables you match on sometimes change from one year to the next, and you often end up matching the same firm with different control firms in different years, which makes the analysis a bit confusing to interpret. Also, I think there are better ways to use propensity scores than through matching, such as using them as weights or even just including them as covariates. Suffice it to say that I am just not a big fan of propensity score matching--other reasonable people may disagree.

There are a few traits of a firm that I think are particularly important to get an exact match on here. The first is the number of members on the board of directors. A firm that has, say, 20 directors evidently has something like twice the chance of experiencing a director death as one that has only 10 directors. And size of board of directors is probably also related to some financial outcomes, if only because, for example, larger businesses (# of empoyees, revenues) will tend to have larger boards, I would think. The other thing that is relevant is the age of the directors. Older ones are more likely to die. And the age of the board members is probably different according to sector, business size and other attributes relevant to financial outcome. At least that's how it looks to me: remember, I'm an epidemiologist, I know next to nothing about finance.

Anyway, I'd be more inclined to get exact match on number of board members, and at least a reasonably close match on average director age. Then if you still have enough control firms (i.e. those with no director death) to go around and you want to further match on another variable or two, go ahead. Everything else that's relevant can just be a covariate in the model.

Lacking expertise in this area, I can't say anything more specific than that.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment