Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2430
#211

17 Aug 2022, 10:20

that is the theory behind it. One should control only for time constant factor to avoid contamination of colliders (covariates that are themselves affected by the treatment)
Pedro has a longer explanation on the Sant'Anna and Zhao paper.
Comment
Lucia Delgado

Join Date: Jul 2022

Posts: 2
#212

19 Aug 2022, 08:27

Dear Fernando,

Thank you so much for all your responses.
I would like to ask a follow up question regarding the different methods available to incorporate covariates when using CSDID.
I have been running estimations with the different methods, in particular DRIPW and IPW.
Do you have any general suggestion for which method would be preferred to use?
Comment

Carlos Avilan

Join Date: Aug 2022
Posts: 2

#213

20 Aug 2022, 09:54

Dear Fernando,

I have a question about a group that is giving me omitted values. I would like to know the reason. In this 2019 group there is only one treated invidual.

Code:

-------------+----------------------------------------------------------------
g2018        |
 t_2010_2011 |    .417272   1.043513     0.40   0.689    -1.627975    2.462519
 t_2011_2012 |  -2.109536   1.391063    -1.52   0.129    -4.835968    .6168969
 t_2012_2013 |  -.3995892   1.061389    -0.38   0.707    -2.479873    1.680695
 t_2013_2014 |  -.7406029   2.144077    -0.35   0.730    -4.942916    3.461711
 t_2014_2015 |    .909678   .8945845     1.02   0.309    -.8436754    2.663031
 t_2015_2016 |  -.0230287   1.381287    -0.02   0.987    -2.730301    2.684244
 t_2016_2017 |   -.712677   .8111132    -0.88   0.380     -2.30243    .8770757
 t_2017_2018 |  -.2012837   1.040814    -0.19   0.847    -2.241243    1.838675
 t_2017_2019 |  -.3479439   1.548543    -0.22   0.822    -3.383032    2.687144
 t_2017_2020 |  -1.541148   1.471187    -1.05   0.295    -4.424622    1.342325
-------------+----------------------------------------------------------------
g2019        |
 t_2010_2011 |          0  (omitted)
 t_2011_2012 |          0  (omitted)
 t_2012_2013 |          0  (omitted)
 t_2013_2014 |          0  (omitted)
 t_2014_2015 |          0  (omitted)
 t_2015_2016 |          0  (omitted)
 t_2016_2017 |          0  (omitted)
 t_2017_2018 |          0  (omitted)
 t_2018_2019 |          0  (omitted)
 t_2018_2020 |          0  (omitted)
------------------------------------------------------------------------------
Control: Never Treated

See Callaway and Sant'Anna (2021) for details

Last edited by Carlos Avilan; 20 Aug 2022, 10:03.

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2430
#214

20 Aug 2022, 12:09

That may be the reason
if gives you omitted values means the command couldn’t estimate the attgt for that group time combination
Comment

Carlos Avilan

Join Date: Aug 2022
Posts: 2

#215

20 Aug 2022, 13:20

Thank you Fernando,

I have a question with the

Code:

estat pretrend

command. If you reject the null hypothesis already at once you can infer that there are no parallel trend?

Is there any other test to infer this? These are the results:

Code:

. estat pretrend
Pretrend Test. H0 All Pre-treatment are equal to 0
chi2(21) =    33.9942
p-value  =     0.0363

And this is the result of events:

Code:

ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     Pre_avg |  -.0336797          .        .       .            .           .
    Post_avg |   .1617237          .        .       .            .           .
         Tm7 |          0  (omitted)
         Tm6 |  -.1388413   .3765135    -0.37   0.712    -.8767941    .5991116
         Tm5 |   .2031394   .3298479     0.62   0.538    -.4433505    .8496293
         Tm4 |   .7527949   .7728274     0.97   0.330    -.7619191    2.267509
         Tm3 |  -.3318175   .3230042    -1.03   0.304     -.964894    .3012591
         Tm2 |  -.2430798   .6018274    -0.40   0.686     -1.42264    .9364802
         Tm1 |  -.4779539   .3281523    -1.46   0.145     -1.12112    .1652128
         Tp0 |   .0803358   .3802481     0.21   0.833    -.6649367    .8256083
         Tp1 |   .0801527   .4688021     0.17   0.864    -.8386825     .998988
         Tp2 |   .1339095   .6637535     0.20   0.840    -1.167023    1.434843
         Tp3 |  -.0960796   .7172673    -0.13   0.893    -1.501898    1.309738
         Tp4 |   .7720239    .809383     0.95   0.340    -.8143376    2.358386
         Tp5 |          0  (omitted)
------------------------------------------------------------------------------

Comment

Doug Hassanali

Join Date: Sep 2018

Posts: 14
#216

06 Sep 2022, 06:16

Mingyu Qi have you been able to use CSDID with repeated cross-sectional data?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#217

06 Sep 2022, 07:08

Hi Doug
you can use csdid with repeated crossection. you just need to changes
1) drop "ivar()" from the syntax
2) you need to define "gvar" correctly. meaning. it needs to identify when would an observation would have been treated if it was indeed observed across time.
For example, say that you have pool crossection for the US. and a treatment that was implemented at the State level.
If the treatment in , say, GA occurred in 2000, then all observations in the pooled crossection who live in GA will have a gvar = 2000.

This variable is a bit harder to define when treatment is too individual specific, but that would be the case with any of the alternative DID estimators.
HTH
Fernando
Comment
Doug Hassanali

Join Date: Sep 2018

Posts: 14
#218

06 Sep 2022, 07:34

Thanks FernandoRios.
In my case, treatment (which is a continuous intensity measure) is individual specific based on a distance radius (occurence of zevents with x km radius of individuals during ages 6 to 16). For observations in the 2 waves, to be treated or not depends on an individual's year of birth. I can set this up under REGHDFE but not quite sure about CSDID.
Any more pointers you can provide will be greatly appreciated.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#219

06 Sep 2022, 08:44

Csdid doesn’t handle continuous treatment
and for treatment you seem to know who is treated
now you need to know when are they treated
Comment
Doug Hassanali

Join Date: Sep 2018

Posts: 14
#220

06 Sep 2022, 21:46

Thanks for the feedback
Comment
Chunxiao Geng

Join Date: May 2022

Posts: 3
#221

11 Sep 2022, 08:20

Originally posted by Chunxiao Geng View Post

Dear Fernando,
Thanks for your great contribution to the CSDID package!!
My dataset is in loan-level, which means there are multiple pieces of loans for one firm in one year. I focus on the effect of a legislation (LEG), which is staggered issued by different states, on the loan interest rate (INT). I have some questions:
I use the following code for a TWFE regression:
reghdfe INT LEG MATURITY SIZE GDP, a (i.year i.firm i.guarantee) cluster(firm)

where MATURITY is the maturity of loan, which is a loan-specific variable. SIZE is the scale of asset of firm i in year t, which is a firm-year specific variable. GDP is the economic development of a state in year t, which is a state-year specific variable. In other words, all the controls are time-variated. guarantee is the type of collateral of loan, and I include it in the model as a kind of fixed effect.
Is the above regression valid as a staggered did since the sample is in loan-level rather than firm-year level panel data?

I use CSDID to improve the model with following code:
csdid INT MATURITY SIZE GDP i.guarantee, time(year) gvar(first_treat) notyet cluster(firmcd)

where first_treat is the first time when a state issues the legislation.
Is CSDID applicable in this data since the sample is in loan-level rather than firm-year level panel data? In my understanding, CSDID is applicable because the sample is a kind of repeated crosssection.

I have read other posts in the forum and understand that the fixed effects could be included in CSDID, but when I include it, all the output is blank and a “conformability error” appears (please find the attachment). Could you shed light on it?

I noticed in the help file that for the controls, “Only base period values are used in the model estimation”. Does it mean that the including of time-variated covariates is not helpful to improve the model? If so, how the CSDID deals with the parallel trends conditional on observed time-variated covariates?

My questions may be silly. Great appreciate any helpful answers from you!

hi Chunxiao Geng

1) Im not sure what guarantee is, not what is your treatment variable.
In any case, if you are trying to use twfe approach, you may still want to have a variable for year of first treatment and periods integrated in your model.
You do not need to control for firm fixed effects in that case, only for the cohort (or year of first treatment) Or at least that is how I understand Wooldridge approach.

2) For your use of first_treat, Im still unsure which variable in twfe relates to first treat. I do think you can CSDID as a kind of repeated crossection, not a panel because you do not have a formal panel dataset. Unless you assume that each firm/loan is a different panel unit.

3) You can add fixed effects (as dummies) but with caveats.
- you need both already treated and not yet treated(never treated) for each subgroup within each panel. in your case, you need both treated and untreated units for each value of guarantee. Otherwise, you are failing the overlapping assumption
- Even if you have treated/untreated observations per level of guarantee, you also need enough observations to identify all other variables, especially for drimp and dripw methods. The reason for this is that each time CSDID runs a specific 2x2 DID, you have only a fraction of observations in each model. Thus, you may simply not have enough data to identify coefficients for all variables in your model. This bites a lot when you use logits or inverse probability tilting specifications
- because of the above, you are getting NO results, so there is nothing to summarize. that is why, I believe, you are getting the reported error

4) If you are using panel data estimators, only Base period values are used in the estimation. In fact based on CS, and the example he provides and uses, he starts with the assumption that all controls are time fixed. So, it doesn't matter which period data you use, it will have the same impact.

Empirically, unless you transform your data, the next best thing was to use the base period values for the regressions, when you look forward (Estimate post-treatment ATT's). All data after is simply not used, because it may be contaminated with effects from the treatment (which you want to avoid).

When you have repeated crossection, you cannot do that, because you do not have data for other periods but current period. Thus CS simply imposes the assumption that the data is either stationary or as good as fixed.

you can try using data that happens after treatment was introduced, and in fact a later paper by Callay and coauthors (came this year but do not recall the title), suggests doing this, with the caveat that it may have large consequences estimating ATT's

Dear Fernando,

Thanks a lot for your reply. I have some follow-up questions and further explanations about my data for your questions mentioned in your last reply.
First, guarantee denotes the type of guarantee of each loan, e.g., the collateral with estate, inventory, or the accounts receivable. As there are many types of collateral, so I include the guarantee as the fixed effect in the regression model.
Second, first_treat denotes the first time when a state issues the legislation. For example, if state s issued the legislation in year 2009, first_treat equals 2009 for all loans in state s (no matter the loan is obtained before or after 2009). Correspondingly, LEG in the TWFE model equals 1 for loans obtained after 2009 in state s. Therefore, LEG is the variable in TWFE relating to first_treat. If a state never issues the legislation during the sample period, first_treat equals 0. To my understanding, the first_treat is defined correctly.

My follow-up questions are:
Although I only list three control variables in my last post (post #129). Actually, I have more control variables in the model, including loan level variables, borrowing firm level variables, and state level variables. When I run CSDID without any control variables, it reports generally consistent results with the TWFE model But, when I gradually add control variables to the CSDID regression, the results begin become weird. Sometimes with some control variables, it reports an extremely large estimate (e.g., ATT=528560!), which is simply not in line with the actual situation. Sometimes with some other control variables, it reports blank outputs and a “conformability error”. Could you kindly shed light on it?

Given the weird results under control variables, I tend to report a CSDID estimate without control variables. I want to consult whether it is reasonable to do so? To my understanding, if we include covariates in CSDID, we obtain the ATT estimate under parallel trends assumption conditionally on these covariates. Instead, if we don’t include covariates, the estimate is obtained under unconditionally parallel trends assumption. I think the latter is a more strict and convincing specification than the former. Any misunderstanding above?

How to judge whether the parallel trends assumption is valid? I mean, when coding “estat all”, there is a chi2-statis about the “Pretrend Test. H0 All Pre-treatment are equal to 0”. Additionally, “Pre_avg” also could be used as the judgement of validation of parallel trends. Importantly, sometimes the chi2-statis is large, seemingly indicating a violation of the parallel trends, however, the Pre_avg is insignificant, seemingly indicating a validation of the parallel trends. Which should I believe?

My questions may be silly and I apologize for it in advance. Looking forward to your any helpful reply. Thank you very much!

Chunxiao Geng
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#222

12 Sep 2022, 08:54

Hi Chunxiao
1) I suspect that the extremely high ATT is localized on one or two time periods. Would you say that is the case (this would be observed in the RAW output). If that is happening, i wonder if the extremely high effect is due to overfitting (too many control variables vs observations), or lack of overlapping (some of the pscores are too close to 1 or 0). This could definitely create the problem you mention.
Thus, for every single treatted group, i would check if the overlapping assumption holds for every control variable.
2) so, if unconditional PTA holds, then Conditional should hold as well, i would say. but there is no way to be sure about it. I do think the problem you may be having is related to overfitting and lack of common support. Without this, you cannot rely on CPTA
3) The test you mention are all based on different assumptions.
estat pretrend tests if ANY of the pretrend tests is different from zero. So I would say is the most strict one, but also the most sensitive to small violations.
the pre-trend avg is also just a proxy suggesting if the average pre-ATT's are significant. But that can be misleading if you have, say, one positive and one negative ATT before treatment.
You could also run a -test- on all aggregated pretrend effects.

HTH
Fernando
Comment
Antonio MartinsNeto

Join Date: Sep 2022

Posts: 3
#223

15 Sep 2022, 15:00

Dear Fernando,

Thanks a lot for your comments. Very helpful.

I still have a question, though. In the case of having a very long and unbalanced panel. Is it the same to estimate using all the information and then use estat event, window(-5 5)?

Or would I need to drop all information outside this window? If so, would I drop for both treated and control firms?

Antonio
Comment
Mark Van Orden

Join Date: Sep 2022

Posts: 2
#224

22 Sep 2022, 17:28

Hi Fernando,

Can you help explain the difference between the DRIMP and DRIPW estimators?

The csdid documentation and Sant'Anna & Zhao (2020) explain that DRIPW uses stabilized inverse probability weighting and ordinary least squares, while DRIMP uses inverse probability of tilting and weighted least squares. I am under the impression that the DRIPW method also uses weights (IPWs) in its estimation. Can you clarify the difference in these approaches?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#225

23 Sep 2022, 07:41

Hi Antonio
Sorry for the delay answering.
so, you do not need to drop the information before estimating the event effects within the "window". The only advantage of doing that would be to reduce the amount of estimations done on the background.
So, just use estat event, window()
F
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment