You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
I've encountered an issue with the CSDID estimation not using all available observations in my unbalanced panel dataset. To troubleshoot, I've taken several steps:
1. Verified there are no singleton observations by ensuring each ID appears in at least two sample years.
2. Checked for and addressed any missing values across all variables.
3. Experimented with both including and excluding covariates, but the numbers of observations used for estimations remained less than the actual sample size.
4. Conducted a TWFE estimation using reghdfe, which did use the full number of observations, aligning with the actual sample size.
I observed a mention in a previous reply here regarding the decrease in the number of observations in #439. However, I use dripw, which seems to be already less restrictive. Do you have any suggestions on what might be causing this discrepancy with the CSDID estimation? If the exact cause can't be recognized, is it fine to say that "some 2x2 estimations are not feasible"?
If you are concerned about why a particular attgt cannot be estimated, the best thing you can do is estimate that attgt with drdid
in other words keep the control and cohort of interest and only the two years used for the attgt (the pre and post treatment)
once you have that sample you can start by checking if you could do a twfe. And if all data is available with xs overlapping across control and cohort treated.
hth
Unfortunately not. I didnt really like that design, so I changed it to something different.
Dear @FernandoRios,
I also encountered the same problem when using csdid, and I really need the coefficients of the controls. Would you mind sharing the alternative approach you used to replace csdid? Many thanks!
Dear FernandoRios, I am trying to use DRDID for a cross sectional data. I have been using the stata package example and am kinda confused between these two codes below. What is the difference between code 1 and code 2 and when do I use which code.
code1: drdid re age educ black married nodegree hisp re74 if treated==0 | sample==2, time(year) tr(experimental) all
code2: drdid re age educ black married nodegree hisp re74, time(year) tr(experimental) all
PS: I have been using code2 until I came across an issue on this platform where the questioner at some point used code1. Note that if I use code1, I get this error (you do not have a 2X2 design)
I also encountered the same problem when using csdid, and I really need the coefficients of the controls. Would you mind sharing the alternative approach you used to replace csdid? Many thanks!
Dear FernandoRios, I am trying to use DRDID for a cross sectional data. I have been using the stata package example and am kinda confused between these two codes below. What is the difference between code 1 and code 2 and when do I use which code.
code1: drdid re age educ black married nodegree hisp re74 if treated==0 | sample==2, time(year) tr(experimental) all
code2: drdid re age educ black married nodegree hisp re74, time(year) tr(experimental) all
PS: I have been using code2 until I came across an issue on this platform where the questioner at some point used code1. Note that if I use code1, I get this error (you do not have a 2X2 design)
1st. Please update drdid. I think that may be the problem.
Second,
Code2 considers the full sample, however, just as an exercise, code one compares only two groups, the notreated with those in the experimental sample. THe idea was to check if the predicted effect is also zero (since neither group was actually treated).
1st. Please update drdid. I think that may be the problem.
Second,
Code2 considers the full sample, however, just as an exercise, code one compares only two groups, the notreated with those in the experimental sample. THe idea was to check if the predicted effect is also zero (since neither group was actually treated).
HTH
FernandoRios Thank you very much. This is appreciated.
Dear FernandoRios , I find your answers in this thread very clear, thank you.
I have a question regarding the conditional Parallel Trend Assumption (PTA) in CSDID.
According to the latest Roth et al. (2023) paper, under specific assumptions, it is possible to condition DID estimation on pretreatment outcomes, that is using lagged values of dependent variable Y as controls in X(i), something like this:
csdid Y Y_lag , ivar(i) time(t) gvar(g) notyet
1) In CSDID, since Y_lag (lag of dependent variable) would be time-varying, the estimation will condition on the latest value before the treatment assignment, right?
2) I want to replicate such conditional PTA using alternative estimators (e.g., TWFE). How can I incorporate it? As long as I understand, I should include as control the following interaction: c.Y(i,g-1)#i.t where g is the time unit i is treated and t is the time indicator of the panel. However, how can I define Y(i,g-1) for never treated units since I do not have a specific reference time relative to treatment?
Hi Francesco
1) if you use y_lag, the value to be used would be always the earliest.
If you are comparing T to T-1, the Y_lag to be used would correspond to T-2. If you just add Y, it would be t-1.
If you compare T-5 to T-1 (long2 option for PTA), the y_lag used would be from T-6. Etc.
2) Im not sure how would you go on replicating this with TWFE. Precisely for the point you raise.
F
Hi Francesco
1) if you use y_lag, the value to be used would be always the earliest.
If you are comparing T to T-1, the Y_lag to be used would correspond to T-2. If you just add Y, it would be t-1.
If you compare T-5 to T-1 (long2 option for PTA), the y_lag used would be from T-6. Etc.
2) Im not sure how would you go on replicating this with TWFE. Precisely for the point you raise.
F
Hi FernandoRios , many thanks for your kind reply and the clarification.
concerning 2), so isn't it possible to exactly replicate the following specification csdid Y(i,t) Y(i,t-1) , ivar(i) time(t) gvar(g) notyet with other estimators such as Borusyak et al. (2023) (did_imputation)?
Maybe it's my lack of knowledge, but after implementing CDİD I can't find the constant term. matrix list r(table) doesn't help either. Is there a way to identify the constant term?
Thank you so much in advence.
Yucel Gunaydin
Comment