Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2430
#421

15 Dec 2023, 12:51

Its exactly for that reason that i have not been able to identify number of observations.
An observation that is used in one ATTGT may not be used in the other one. CSDID (the slower one) can get that number tho, because of how the data is saved. You can give that a try.
F
Comment
Tommaso Crescioli

Join Date: Aug 2023

Posts: 4
#422

17 Dec 2023, 05:14

Thank you, Fernando. I will! One very last question, each time I run the same code standard errors slightly change despite I am setting the seed at the start of the script. Is there a way to avoid that?

T
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#423

17 Dec 2023, 09:38

Can you make an example that I can review?
thT shouldn’t be happening
Comment
Nate Ives

Join Date: Dec 2023

Posts: 2
#424

19 Dec 2023, 08:47

Hi Fernando, apologies in advance for what might be a very simple question, but I suspect that this might help a few others so figured I would throw in.

Is there a plot command for drdid similar to what is available for csdid? I have been unable to capture the pre-post values of the outcome across treatment and comparison for plotting.

Thanks in advance for your help.
Nate
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#425

19 Dec 2023, 10:03

Unfotunately no.
drdid only produce a single number, (ATT) so there is nothing else underneed to plot
F
Comment
Nate Ives

Join Date: Dec 2023

Posts: 2
#426

19 Dec 2023, 11:14

Thank you~! Makes sense and appreciate the quick response.

Nate
Comment
anna halewska

Join Date: Jan 2024

Posts: 5
#427

09 Jan 2024, 08:25

Hi Fernando, thank you so much for creating the command. I wanted to ask for your advice on how to apply fixed effects in my setting.

I have cross-section data on individuals that live in counties that are in states. My data is collected every decade between 1940 and 1980. Treatment is introduced on county level in 1940, 1950, 1960 and 1970 (I drop the always treated, so those treated in 1940). Outcome is on individual level.
In the initial OLS setting, I had county fixed effects and state-year fixed effects. How do I replicate that for csdid?
My understanding from reading this forum is that my adding i.state_cnty I also add its interaction with time, which makes csdid omit the coefficient, is that correct? Should I then limit to including i.statefips? Like so:

csdid outcome i.statefips [iweight=weight], cluster(state_cnty) time(year) gvar(group)

Furthermore, after adopting the command to this specification, I would like to try to incorporate instrumenting the treatment - can you advise me on how you would go about doing that?

Thanks in advance for your help.
Anna
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#428

09 Jan 2024, 10:21

Hi Anna
1) using fixed effects as you indicate only makes sense if you have a balanced 3D dataset.
In other words, for every state and every period, you observe all Treated groups (Group). In other words, if you could create a three way table of year state and Group, you don't want any blank or zero cells in that table.

2) IV with CSDID is not possible as of right now.

Not completely sure how would one implement that in the GxT case. But you could do it for individual 2x2, using something like Wald estimator.
F
Comment
anna halewska

Join Date: Jan 2024

Posts: 5
#429

10 Jan 2024, 02:59

Hi Fernando, thank you so much for a quick answer. If I may follow-up: going back to the fixed effects, so this here:

csdid outcome i.statefips [iweight=weight], cluster(state_cnty) time(year) gvar(group)

could be a valid command if I were to limit myself to taking 2x2 cases one by one and using only the states that in the 2 given periods have both treated and untreated, correct? Sorry if this is a naive question, I just want to clear on how csdid and fixed effects interact.

Best,
Anna
Comment
anna halewska

Join Date: Jan 2024

Posts: 5
#430

10 Jan 2024, 04:34

Hi Fernando, let me add some details to my question. Following Remark 3 on Callaway SantAnna and limiting my data to a 2x2 case (only years 1950 and 1960, untreated and treated in 1960), I managed to replicate what csdid does with reg/reghdfe. For example:

reghdfe outcome treat, cluster(cnty) absorb(i.year i.group)
replicates
csdid outcome, time(year) gvar(group) cluster(cnty)

reghdfe outcome treat [aweight=weight], cluster(cnty) absorb(i.year i.group)
replicates
csdid outcome [iweight=weight], time(year) gvar(group) cluster(cnty)

Coefficients are the same, standard errors differ a little bit but that is not currently my main concern (although if you have comments on that please let me know).
However, I am not able to get the result of

csdid outcome i.state [iweight=weight], time(year) gvar(group) cluster(cnty)

via reghdfe. Based on this forum, my guess would be to perform reghdfe with absorb(i.state##i.year i.group), but it yields completely different results. I have deleted (three) states with blanks in the three-way table, so the overlapping assumption holds: for all states in both years I have treated and untreated counties. It is not the matter of weights or method(reg) either. Can you advise on how to replicate with result?

Thank you.
Anna
Comment
Katarina Sandberg

Join Date: Feb 2023

Posts: 18
#431

11 Jan 2024, 09:49

Hi Fernando!
I have been asked to replicate the results from my csdid estimates through TWFE as a robustness test.

The csdid command I use is:
1. csdid depvar, ivar(municipality) time(year) gvar(first_treat) long2
From my understanding that would be similar to:
1. reghdfe depvar indepvar, absorb(municipality year) vce(cluster municipality)

However, I am not so sure about what would be similar to my code using dripw:
2. csdid depvar control variables, ivar(municipality) time(year) gvar(first_treat) method(dripw) long2

How would you go about providing a robustness test for number 2 (i.e. incorporating something similar to dripw in the TWFE)?

Furthermore, I wonder if one could expect each year (leads and lags) to be very similar between TWFE and csdid, or if they are expected to differ? How would you explain the difference to a reader?

Lastly, what does long2 do?

Best,
Katarina
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#432

11 Jan 2024, 10:25

anna halewska
I don't think there is a direct replication with reghdfe that would give you a single coefficient.
The best approach is using what Wooldridge does with etwfe.
Also, check out Sant'Anna and Zhao (2020) paper for other details

Katarina Sandberg
What you point out is the equivalent but for the wrong case.
for dripw, there is no equivalent.
The reason for that is because with DRIPW, control units receive different weights to match the treated units. Perhaps if those are compiled and aggregated, it would give you something similar to CSDID. It would be easy if you have balanced panel and time constant variables. Otherwise, it would be a very time consuming effort

for differences between TWFE (leads and lags) and CSDID see Sun Abraham 2021.

Long2 request to get pre-treatment ATTs that are similar to event studies.

F
Comment
Katarina Sandberg

Join Date: Feb 2023

Posts: 18
#433

14 Jan 2024, 22:26

Thanks for the help!

So, if I understand you correctly the fact that I am using DRIPW makes it difficult to make comparisons between csdid and TWFE?
What if you used TWFE which also employs some kind of matching and weighting?

Additionally, would it still make sense to include the long2 command in csdid even if I do not make the comparison?
Comment
Katarina Sandberg

Join Date: Feb 2023

Posts: 18
#434

15 Jan 2024, 00:53

Hi again Fernando!

I realized I also have a question regarding clustered standard errors.

In my study, I use a panel data set of municipalities and want to cluster the standard errors on the municipal level (which is defined by the id of the municipality, the ‘municipalcode’).

However, if I use the command:
csdid depvar independent variables, ivar(municipalcode) time(year) gvar(first_treat) method(dripw) cluster(municipalcode)
It will give me an error message – stating that “municipalcode may not be both target and by()”

Is this because standard errors are already clustered on the municipal level?

Many thanks in advance!
Katarina
Comment
Sabina Nowak

Join Date: Jan 2024

Posts: 2
#435

25 Jan 2024, 05:02

Dear FernandoRios,
I am using csdid and csdid2, but I am not sure about the results I receive. My panel data is firm-level, period 2012-2022, about 108,000 firm-year observations. Firms are from 19 countries. I am investigating the effects of policies implemented in the selected countries in five years (2013: one country, 2015: 2 countries, 2016: 2 countries, 2017: 1 country, 2018: 2 countries, the rest of the countries did not implement the policies). I need to use many controls, but I am unsure about the ivar.
Should the ivar refer to the firm ID if the panel dataset is firm-level?
In all the examples I have found, the ivar referred to the country (or state) where the policies were implemented.

My simple code is as follows:
csdid2 `var' $controls , ivar(id) time(year) gvar(first_treat)

Is this correct?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment