Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2362
#526

11 Jul 2024, 10:48

Originally posted by Mathieu Simoens View Post

Dear FernandoRios,

I have two questions regarding the use of clustering in csdid.
I am estimating a staggered DiD, investigating the impact of a staggered series of events on firm profitability, and I am running the following code:
csdid firm_profit, ivar(firm_id) time(year_id) gvar(year_treatment) long2

My two questions are the following:

1) I would like to cluster at state level (instead of the automatic clustering at firm level), because the intervention/treatment I am investigating happened at state level. If I understand the help file correctly, adding the option "cluster(state_id)" will result in double-clustering, right? However, because my firm_id variable is nested within my state_id variable (every firm is located in only one state), does this actually correspond with the simple clustering at state level that I require?

2) The help file describes the default "robust and asymptotic" standard errors as well as the option to compute "wild bootstrap" standard errors (and both are possible with clustering). Could you provide some guidance in why one might choose one option or the other?

Apologies if you already answered these questions at a certain point, but I could not find the answers on the forum.
Thank you in advance!

Mathieu

1) It is a double clustering, but because the clusters are nested, only the larger one (States) is the relevant for your analysis and description.
2) No much guidance on one over the other
Asymptic works for most cases, but Wildbootstrap provide you with Uniform confidence intervals. They are a bit larger to reduce problems of multiple testing.

HTH
Comment

Tiziana Giuliani

Join Date: Jan 2024
Posts: 13

#527

20 Jul 2024, 09:41

Hello FernandoRios
I checked through your answers on how to interpret the pre-trend tests. If I understood correctly, given a p-value of 0.15 (I reported all my results below), the null hypothesis is rejected and the parallel trend assumption does not hold.

In this case, shall I try to combine psmatch and csdid? Do you have any advice on how to perform this on Stata? I have tried including in the csdid formula the weights (not pscores) generated through psmatch2 in the pre-treatment years, and then run psmatch in the treatment years... but I am not sure this is correct.

Many many thanks for your support

Tiziana

-.-.-.-.-.-.-.-.-.-.-.-

asdoc csdid valesp fatturato esperienza,[cluster( codimp )] time( anno ) gvar( anno_primo_trattamento ) method(dripw)

............................
Difference-in-difference with Multiple Time Periods
Number of obs = 161,118
Outcome model : least squares
Treatment model: inverse probability
(Std. err. adjusted for 63,659 clusters in codimp)

	Coefficient	Std.err.	z	P>z	[95%	conf. interv.
g2017
t_2014_2015	2.46e+05	3.06e+05	0.800	0.421	-353902	846449
t_2015_2016	83092.510	1.42e+05	0.580	0.559	-1.95e+05	3.61e+05
t_2016_2017	1.61e+05	2.40e+05	0.670	0.503	-3.10e+05	6.32e+05
t_2016_2018	2.95e+05	3.63e+05	0.810	0.416	-4.16e+05	1005834
t_2016_2019	408185	3.38e+05	1.210	0.227	-2.55e+05	1071053
t_2016_2020	2.09e+05	2.58e+05	0.810	0.418	-2.97e+05	7.14e+05
t_2016_2021	4.41e+05	2.12e+05	2.080	0.037	25982.640	8.56e+05
g2018
t_2014_2015	6.01e+05	4.98e+05	1.210	0.227	-3.75e+05	1577087
t_2015_2016	-5.74e+04	2.99e+05	-0.190	0.848	-6.43e+05	5.28e+05
t_2016_2017	-8.52e+04	250164	-0.340	0.733	-5.76e+05	4.05e+05
t_2017_2018	3.57e+05	227532	1.570	0.117	-8.94e+04	8.03e+05
t_2017_2019	1.82e+05	2.71e+05	0.670	0.501	-3.48e+05	7.13e+05
t_2017_2020	516982	1.80e+05	2.860	0.004	1.63e+05	8.71e+05
t_2017_2021	701745	1.91e+05	3.680	0.000	3.28e+05	1075210
g2019
t_2014_2015	1179084	7.37e+05	1.600	0.110	-2.66e+05	2624075
t_2015_2016	-5.50e+04	2.48e+05	-0.220	0.825	-5.42e+05	431796
t_2016_2017	-2.28e+05	2.38e+05	-0.960	0.338	-6.93e+05	2.38e+05
t_2017_2018	-1.51e+04	2.05e+05	-0.070	0.941	-4.17e+05	3.87e+05
t_2018_2019	-1.03e+05	2.60e+05	-0.400	0.692	-6.12e+05	4.07e+05
t_2018_2020	3.62e+05	3.23e+05	1.120	0.263	-2.71e+05	9.94e+05
t_2018_2021	360051	4.96e+05	0.730	0.468	-6.12e+05	1332140
g2020
t_2014_2015	1.95e+05	4.25e+05	0.460	0.646	-6.38e+05	1029328
t_2015_2016	-1.89e+05	187601	-1.010	0.313	-5.57e+05	1.78e+05
t_2016_2017	4.64e+05	3.37e+05	1.380	0.169	-1.97e+05	1123754
t_2017_2018	-2.27e+05	2.17e+05	-1.040	0.297	-652552	1.99e+05
t_2018_2019	2.51e+05	216159	1.160	0.246	-1.73e+05	6.74e+05
t_2019_2020	-4.07e+05	2.69e+05	-1.510	0.130	-9.33e+05	1.20e+05
t_2019_2021	17255.130	1.87e+05	0.090	0.927	-3.50e+05	3.84e+05

Control: Never Treated
See Callaway and Sant'Anna (2021) for details

Pretrend Test. H0 All Pre-treatment are equal to 0
chi2(14) = 19.2172
p-value = 0.1568

Comment

Mohammed Altantawy

Join Date: Jul 2024

Posts: 6
#528

22 Jul 2024, 13:26

Hi all,

I am attempting to estimate the ATT using CSDID. My data and variables are described below:

Data: unbalanced panel data

Variables:
outcome: roa (continuous)

treatment: sp (equals to 0 for never treated and always 1 if treated at any year)

ivar: cik (firm ID)

time: year

gvar: first_treat (0 for not treated or the first year of treatment)

exact matching variable: industry_year (group of two variables industry and year)

nearest neighbor matching variable: market_cap (continuous)

My current code is:

Code:

csdid roa, ivar(cik) time(year) gvar(first_treat) notyet agg(simple)

I’d like to match each treated firm with nearest control firms based on market_cap (nearest neighbor matching variable) on condition that both the treated and control firms are within the same industry_year (exact matching variable).

Any suggestions to amend my csdid code to incorporate both matching variables?

Thanks in advance
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#529

22 Jul 2024, 19:11

Hi Tiziana
that means evidence does not reject pta
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#530

22 Jul 2024, 19:13

With matching
perhaps identify the matching pairs then just run csdid without controls
Comment
Tiziana Giuliani

Join Date: Jan 2024

Posts: 13
#531

23 Jul 2024, 02:14

Originally posted by FernandoRios View Post

With matching
perhaps identify the matching pairs then just run csdid without controls

Thank you, Fernando. I am not sure I understood correctly, so please allow me the last questions on the matter: 1. is there an alternative to matching to verify that treated and control used for running csdid are on common support and if after matching the two groups are balanced? 2. what does it mean "run csdid without controls"? is it correct to incorporate in the csid command _weight or _pscore generated through psmatch2?
Again, 1.000 thx

Last edited by Tiziana Giuliani; 23 Jul 2024, 02:46.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#532

23 Jul 2024, 04:31

No there is no test built in to verify balance. You have to present such data on your own using other methods
run without controls means to run it like
csdid y , ivar()…. Etc
no xs after de dep variable
finally, you can add those weights but there is no evidence that I k ow regarding being correct or
not
1 like
Comment
Mohammed Altantawy

Join Date: Jul 2024

Posts: 6
#533

23 Jul 2024, 11:58

Dear @FernandoRios

Thanks so much for your help and support.

would you please have a look at the error that i have with the jwdid command?

https://www.statalist.org/forums/for...31#post1759231
Comment
Pedro Americo

Join Date: Apr 2018

Posts: 4
#534

03 Aug 2024, 17:04

Hi @FernandoRios,

I have a panel data set where my time variable has no regular gaps (time = 1870, 1920, 1940, 1950). As you have already explained in some posts, in this scenario, csdid cannot estimate the treatment effect. Therefore, I have a question: Is there any alternative to estimating heterogeneity-robust DID estimators when our data have time gaps?

Thanks in advance.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#535

03 Aug 2024, 20:40

Perhaps jwdid would work. I added fewer timing gap constrains there. And the results are very robust, and similar to CSDID.
The interpretation may be a bit tricky tho.
1 like
Comment
Pedro Americo

Join Date: Apr 2018

Posts: 4
#536

05 Aug 2024, 17:20

Thanks, FernandoRios. I appreciate your suggestion.
Comment
Malte Be

Join Date: Jan 2022

Posts: 4
#537

19 Aug 2024, 07:37

Dear FernandoRios,

I am working with csdid in Stata 18. I would like to transform my outcome variables to account for potental non-linearities.

But the outcomes contain many 0s, making a log-transformation unfeasible.

Is there any way to use a ppml-transformation with csdid?

Thanks a lot!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#538

19 Aug 2024, 08:03

Not possible.
You may have to use other approaches.
Using jwdid, you can specify method(ppmlhdfe)
F
Comment
Malte Be

Join Date: Jan 2022

Posts: 4
#539

20 Aug 2024, 02:55

Thanks a lot FernandoRios !
Comment
Mathieu Simoens

Join Date: Jul 2024

Posts: 2
#540

23 Aug 2024, 05:56

Dear FernandoRios,

Would it be possible to give some guidance on the size limits of csdid2?
I have a dataset with approx. 10 million observations and around 50 events.
Sometimes, csdid2 seems to run without problem (although it takes very long), sometimes it stops during the estimation (and Stata blocks/closes), and sometimes it produces the following error before even starting the estimation:
csdid::csdid(): 3900 out of memory
<istmt>: - function returned error
Any tips to avoid this error would of course also be very appreciated!

Thanks in advance,

Mathieu
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment