Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2362
#541

23 Aug 2024, 07:54

Not much. You need more memory to run it. with 10million observations and 50 events, and I assume 50 periods you have 25'000 Million cells.
Im working on a more efficient way to store the data that may help, but will be a good time before i can release it
F
Comment
Tiziana Giuliani

Join Date: Jan 2024

Posts: 13
#542

02 Sep 2024, 11:43

Hello FernandoRios
hope my message finds you well.

I am writing to clarify one point that I am not sure it is enough clear to me.

I have a large UNBALANCED panel dataset at firm-level and based on what I read here I've applied csdid using cluster insted of ivar:
csdid valesp pc1 pc2 qual_pers labprod labint esperienza,[cluster( codimp )] time( anno ) gvar( anno_primo_trattamento ) method(dripw
> )

However, I've not clear what kind of clusterization is used by the command and therefore if I used it properly.

Thanks a lot for all your support!

Regards,

Tiziana
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#543

02 Sep 2024, 11:51

Good question
so, when using panel data, CSDID applies an implicit cluster at the panel level. You can, of course, use other cluster levels, but by construction those should be such that they are nested with the panel ID
when you use RC data, CSDID does not cluster SE at all. Its important, then, to use cluster to request cluster SE
One option is to use the panel id as cluster.
Now, because of how the code works, results using panel vs RC estimates could be very different. So, unless you have a strong reason to treat your data as RC, i would suggest to use it as panel
HTH
Comment
Tiziana Giuliani

Join Date: Jan 2024

Posts: 13
#544

02 Sep 2024, 13:11

Originally posted by FernandoRios View Post

Good question
so, when using panel data, CSDID applies an implicit cluster at the panel level. You can, of course, use other cluster levels, but by construction those should be such that they are nested with the panel ID
when you use RC data, CSDID does not cluster SE at all. Its important, then, to use cluster to request cluster SE
One option is to use the panel id as cluster.
Now, because of how the code works, results using panel vs RC estimates could be very different. So, unless you have a strong reason to treat your data as RC, i would suggest to use it as panel
HTH

Thank you! sorry for the trivial question now...but what RC data mean?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#545

02 Sep 2024, 14:41

Repeated crossection =RC
1 like
Comment
Julian Spieleder

Join Date: Sep 2024

Posts: 2
#546

08 Sep 2024, 15:18

Dear FernandoRios

first of all: Incredible work!
In my research your csdid command may help a lot. I was able to solve the issues with the code but theoretical questions remain for me. That is why I am kindly reaching out to you.

How do I define the gvar variable when I have units with multiple treatments in various years? I have a timeframe that can go from 2006 to 2022. For instance when there are treatments in 2009 and 2014 - how do I define the gvar variable for the unit in 2008, 2010 and 2015?
And how about the fact that data for some units start in 2014 and other data in 2006 - how can I proceed in that regard? Must all units have the same starting year?
I have compared the not-yet treated with the already treated (and always treated) groups (heterogenous).

A reply from you would be very helpful for this research project. Please do not hesitate to reach out if something is unclear.
Thank you !
Comment
Iryna Hayduk

Join Date: Dec 2021

Posts: 7
#547

16 Sep 2024, 12:08

Hi FernandoRios,

Why is T-1 not 0 in this event study, found in one of your examples (please see it below)?

estat event
ATT by Periods Before and After treatment
Event Study: Dynamic effects
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Tm3 | .0267278 . 0140657 1.90 0.057 -.0008404 .054296
Tm2 | -.0036165 . 0129283 -0.28 0.780 -.0289555 .0217226
Tm1 | -.023244 0144851 -1.60 0.109 -.0516343 .0051463
Tp0 | -.0210604 0114942 -1.83 0.067 -.0435886 .0014679
Tp1 | -.0530032 . 0163465 -3.24 0.001 -.0850417 -.0209647
Tp2 | -.1404483 . 0353782 -3.97 0.000 -.2097882 -.0711084
Tp3 | -.1069039 . 0328865 -3.25 0.001 -.1713602 -.0424476

Thank you in advance.
Iryna

Last edited by Iryna Hayduk; 16 Sep 2024, 12:25.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#548

16 Sep 2024, 14:20

Because the default behaivior of CSDID is to produce short-gaps for pre-treatment effects
long2 option will produce what you have in mind
F
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#549

16 Sep 2024, 14:22

Originally posted by Julian Spieleder View Post

Dear FernandoRios

first of all: Incredible work!
In my research your csdid command may help a lot. I was able to solve the issues with the code but theoretical questions remain for me. That is why I am kindly reaching out to you.

How do I define the gvar variable when I have units with multiple treatments in various years? I have a timeframe that can go from 2006 to 2022. For instance when there are treatments in 2009 and 2014 - how do I define the gvar variable for the unit in 2008, 2010 and 2015?
And how about the fact that data for some units start in 2014 and other data in 2006 - how can I proceed in that regard? Must all units have the same starting year?
I have compared the not-yet treated with the already treated (and always treated) groups (heterogenous).

A reply from you would be very helpful for this research project. Please do not hesitate to reach out if something is unclear.
Thank you !

CSDID assume a unit is treated only once. then it remains treated.
You can probably analyze the event's to say something about subsequent treatments
Comment
Iryna Hayduk

Join Date: Dec 2021

Posts: 7
#550

17 Sep 2024, 22:20

Thank you so much, Fernando!
Comment
Sai Zhang

Join Date: Oct 2024

Posts: 2
#551

10 Oct 2024, 16:30

Dear FernandoRios , thank you so much for coding this wonderful package, I have 2 questions that I would really appreicate your insights:
Is there a way to added a group indicator interacted with the treatment status for heterogeneity analysis? Essentially, estimate the same model for 2 groups of individuals, and perform a statisical inference for the difference of ATTs between the two groups?

I have a data set that look like this:

| year-month of layoff
yrmth | 2022m10 2022m11 2022m12 2023m1 2023m2 2023m3 2023m4 2023m5 | Total
-----------+----------------------------------------------------------------------------------------+----------
2019m10 | 210 0 0 0 0 0 0 0 | 210
2019m11 | 210 2,534 0 0 0 0 0 0 | 2,744
2019m12 | 210 2,534 204 0 0 0 0 0 | 2,948
2020m1 | 213 2,538 204 1,088 0 0 0 0 | 4,043
2020m2 | 213 2,543 206 1,090 1,210 0 0 0 | 5,262
2020m3 | 213 2,545 206 1,090 1,211 202 0 0 | 5,467
2020m4 | 213 2,545 206 1,091 1,211 202 258 0 | 5,726
2020m5 | 213 2,548 206 1,091 1,214 202 258 412 | 6,144
2020m6 | 213 2,551 206 1,093 1,217 202 258 412 | 6,152
2020m7 | 214 2,555 206 1,093 1,217 202 258 412 | 6,157
2020m8 | 214 2,557 206 1,095 1,218 202 258 412 | 6,162
2020m9 | 214 2,557 206 1,097 1,220 202 259 412 | 6,167
2020m10 | 214 2,557 206 1,098 1,220 202 259 412 | 6,168
2020m11 | 214 2,557 206 1,098 1,220 202 259 412 | 6,168
2020m12 | 214 2,558 206 1,099 1,221 205 259 412 | 6,174
2021m1 | 214 2,560 206 1,100 1,222 205 259 412 | 6,178
2021m2 | 214 2,561 206 1,100 1,222 205 259 413 | 6,180
2021m3 | 214 2,561 206 1,100 1,223 206 259 413 | 6,182
2021m4 | 214 2,562 206 1,101 1,223 207 259 413 | 6,185
2021m5 | 214 2,562 206 1,103 1,231 207 259 414 | 6,196
2021m6 | 215 2,564 206 1,104 1,232 207 259 414 | 6,201
2021m7 | 215 2,564 206 1,104 1,232 207 259 414 | 6,201
2021m8 | 215 2,565 206 1,105 1,232 207 259 414 | 6,203
2021m9 | 216 2,567 206 1,105 1,234 207 259 414 | 6,208
2021m10 | 216 2,567 206 1,106 1,235 207 259 414 | 6,210
2021m11 | 216 2,567 206 1,106 1,236 207 259 414 | 6,211
2021m12 | 216 2,567 207 1,106 1,237 207 259 414 | 6,213
2022m1 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m2 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m3 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m4 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m5 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m6 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m7 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m8 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m9 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m10 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m11 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2022m12 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m1 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m2 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m3 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m4 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m5 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m6 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m7 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m8 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m9 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m10 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m11 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2023m12 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2024m1 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2024m2 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2024m3 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2024m4 | 216 2,567 207 1,106 1,239 207 259 414 | 6,215
2024m5 | 0 2,567 207 1,106 1,239 207 259 414 | 5,999
2024m6 | 0 0 207 1,106 1,239 207 259 414 | 3,432
2024m7 | 0 0 0 1,106 1,239 207 259 414 | 3,225
-----------+----------------------------------------------------------------------------------------+----------
Total | 11,819 140,889 11,357 60,649 66,547 10,919 13,463 21,093 | 336,736

Running the following model:
csdid unemployed exp_prelayoff1, ivar(id) time(yrmth) gvar(yrmth_layoff) method(dripw) agg(event)

gives:
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Pre_avg | -.0006914 .0002389 -2.89 0.004 -.0011597 -.0002231
Post_avg | .2027581 .0076338 26.56 0.000 .1877961 .2177201
Tm34 | -.0088991 .0062479 -1.42 0.154 -.0211448 .0033466
Tm33 | -.0045457 .0031133 -1.46 0.144 -.0106476 .0015563
Tm32 | -.0024586 .0028415 -0.87 0.387 -.0080278 .0031105
Tm31 | -.0006661 .0026365 -0.25 0.801 -.0058336 .0045013
Tm30 | .0060275 .0030708 1.96 0.050 8.77e-06 .0120463
Tm29 | -.0027378 .0030822 -0.89 0.374 -.0087787 .0033032
Tm28 | -.0033564 .0027135 -1.24 0.216 -.0086748 .001962
Tm27 | -.0040143 .0026394 -1.52 0.128 -.0091875 .0011588
Tm26 | .0022505 .0029911 0.75 0.452 -.0036119 .0081128
Tm25 | -.0025741 .0026625 -0.97 0.334 -.0077924 .0026442
Tm24 | -.0017441 .0023512 -0.74 0.458 -.0063525 .0028642
Tm23 | .0002633 .0022844 0.12 0.908 -.0042142 .0047407
Tm22 | .0024963 .0028022 0.89 0.373 -.0029958 .0079885
Tm21 | -.0034214 .0022559 -1.52 0.129 -.0078429 .0010002
Tm20 | .0018182 .002562 0.71 0.478 -.0032032 .0068395
Tm19 | -.0025347 .0025319 -1.00 0.317 -.007497 .0024276
Tm18 | .0033112 .0023151 1.43 0.153 -.0012263 .0078487
Tm17 | .0015265 .0029202 0.52 0.601 -.0041971 .0072501
Tm16 | -.000524 .0025744 -0.20 0.839 -.0055697 .0045217
Tm15 | -.0030428 .0021036 -1.45 0.148 -.0071657 .0010801
Tm14 | -.0059468 .0026643 -2.23 0.026 -.0111687 -.0007249
Tm13 | -.00442 .0022923 -1.93 0.054 -.0089127 .0000728
Tm12 | .0007008 .0021672 0.32 0.746 -.0035469 .0049485
Tm11 | -.0014102 .0020209 -0.70 0.485 -.005371 .0025507
Tm10 | -.0001684 .0022892 -0.07 0.941 -.0046551 .0043182
Tm9 | -.0015793 .0019785 -0.80 0.425 -.0054571 .0022985
Tm8 | -.0006364 .0020603 -0.31 0.757 -.0046746 .0034018
Tm7 | -.0045712 .0021215 -2.15 0.031 -.0087293 -.0004132
Tm6 | .0038454 .0021383 1.80 0.072 -.0003455 .0080364
Tm5 | .0027777 .0020066 1.38 0.166 -.0011552 .0067106
Tm4 | .0045296 .0021799 2.08 0.038 .0002571 .0088021
Tm3 | .0049289 .0020844 2.36 0.018 .0008435 .0090143
Tm2 | .0043273 .0024805 1.74 0.081 -.0005345 .009189
Tm1 | -.0030591 .0017085 -1.79 0.073 -.0064076 .0002894
Tp0 | -.0042194 .0021934 -1.92 0.054 -.0085183 .0000796
Tp1 | .2384498 .0068016 35.06 0.000 .225119 .2517806
Tp2 | .300376 .0073067 41.11 0.000 .2860551 .314697
Tp3 | .2175117 .0090234 24.11 0.000 .1998261 .2351973
Tp4 | .268502 .0105102 25.55 0.000 .2479023 .2891017
Tp5 | .2420725 .010709 22.60 0.000 .2210833 .2630617
Tp6 | .1566136 .0330584 4.74 0.000 .0918203 .221407
------------------------------------------------------------------------------
Control: Not yet Treated

I'm not sure why only 6 post treatment periods were estimated? I've looked through the discussion here but still could figure out a good explanation. It would be greatly appreciated if you could help with this.

Thank you so much!

Best,
Sai
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2362
#552

10 Oct 2024, 20:20

1) its not possible. at best you can make the analysis for separate groups
2) You can only see up to 6 periods, because after 6 periods, there are no more not yet treated observations (from 2022m11 perspective)
1 like
Comment
Sai Zhang

Join Date: Oct 2024

Posts: 2
#553

10 Oct 2024, 21:47

Originally posted by FernandoRios View Post

1) its not possible. at best you can make the analysis for separate groups
2) You can only see up to 6 periods, because after 6 periods, there are no more not yet treated observations (from 2022m11 perspective)

Got it, thank you so much!
Comment
Isabella Helter

Join Date: Dec 2021

Posts: 7
#554

11 Oct 2024, 08:47

Dear FernandoRios, I'm using your new csdid2 package (it's much faster than the old one, thank you very much!!), but I don't understand how to save the results of my regression (like the number of observations and the R2). The old csdid package had a "saverif()" option that allowed me to do this and then use cs_estat. But csdid2 doesn't seem to have this option, so how do I view/save these statistics?
Comment
Tiziana Giuliani

Join Date: Jan 2024

Posts: 13
#555

14 Oct 2024, 15:10

Hello FernandoRios, hope my message finds you well. I am writing to ask your kind advise on how to improve this plot (i think it depend on the bandwith of time pre-treatment -2). In this case I have used principal component analysis and then included component with eigenvalue>1 as covariates. I am attaching the results of the event study and the graph to this message.

Thank you so much in advance for your support!

Tiziana
Attached Files

Graph_PCA_CSDID.docx (42.0 KB, 1 view)

Last edited by Tiziana Giuliani; 14 Oct 2024, 15:14.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment