Differences between results from 'csdid' command and 'did' package in R

Rory Todd

Join Date: Jun 2023

Posts: 5
#1

Differences between results from 'csdid' command and 'did' package in R

24 Jul 2024, 05:14

Dear @FernandoRios,

Sorry to bother! I'm hoping you might be able to help me with an issues I'm having with the 'csdid' command in Stata.

I'm trying to implement the Callaway & Sant'Anna estimator for staggered differences-in-differences design. I have used the R package 'did' with the function 'attgt()', as well as the Stata function 'csdid'. I have about 12 different outcome variables. Essentially I can't make the results converge: for some outcome variables, results are very similar, but for others they are quite different.

I've attached a dataset for one variable. I'm using the Stata command:

Code:

csdid tempO ln_GNI_pc ln_wdi_pop, ivar(ccode) time(year) gvar(firstZyear) method(dripw) estat group, post

and the R function:

Code:

att_gt(yname = "tempO", tname = "year", idname = "ccode", gname = "firstZyear", data = raw, xformla = ~ln_GNI_pc+ln_wdi_pop+1 ) aggte(attgt, type="group", na.rm=TRUE)

The results are a bit different and I just can't work out why. I've also experimented with the 'notyet' and 'asinr' options, which do change Stata results a bit but still aren't the same as in R. I've also experimented with all of the different 'method()' options, but again results don't converge.

Do you have any suggestions?

Thanks a lot!
Rory
Attached Files

FOR_R_ids_tot_debt_cdGNI_hipc_decision.dta (130.2 KB, 1 view)
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2312
#2

24 Jul 2024, 08:44

Hard to say. Is the data balanced?
what happens if you use reg as the method (outcome regression)
are the problems for all pre and post atts?
could you replicate this using the example dataset?
Comment
Rory Todd

Join Date: Jun 2023

Posts: 5
#3

26 Jul 2024, 03:35

Hi Fernando,

Thanks for replying and I'm sorry for the delay in getting back to you!

Yes, the data is balanced.

So when I use reg for both R and Stata, the point estimates are the same actually! (although the standard errors are a little different).

When both are set to 'ipw' or doubly robust, point estimates are different, including group averages and dynamic averages (post ATT I mean - the output from attgt() in R doesn't seem to show pre ATTs?)

I tried to replicate the problem with the example dataset, but in that case the R/Stata output does align.

Maybe below screenshots will help though. When set to doubly robust ('dripw' in the Stata) and I use 'estat group' (which is my aggregation of interest): the group estimates for 3 cohorts are identical between R and Stata. Only for the 2001 cohort, Stata makes an estimate while R is all 'NAs' in the output. R then gives overall average of 0.37 while Stata gives 'omitted'.

Thanks again for your help!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#4

26 Jul 2024, 07:16

ok that gives the clue
you see how Stata produces a 2001 result? but not in R? I think there are other incode decisions regarding how to use or not use data, that may be explaining the differences.
So, unfortunately, there is nothing that can be done about it other than making an in-depth exploration for each 2x2 case, and see where differences arise.
Comment
Rory Todd

Join Date: Jun 2023

Posts: 5
#5

26 Jul 2024, 07:49

Thanks for this. Could you explain how I'd do that? Would I have to go into the code for each command?
Comment

Announcement

Differences between results from 'csdid' command and 'did' package in R

Comment

Comment

Comment

Comment