I want to select the most robust model for a paper on economic growth after an exogenous event, affecting some units starting all during the same year. I have read through both papers below but am seeking a clear explanation as to why I get differing results and which is more robust to the issues in my analysis.(Pardon the vagueness; I do not want to be recognized by reviewers).
A toy version (arbitrarily altered subset) of my data and models are below, and these behave like my real data: classic TWFE and DiD are fairly similar to AIPW (positive, significant) but distinct from Wooldridge's TWFE (positive, insignificant) accounting for heterogeneity when I use RHS controls (log_pop). They are more similar without controls, but they are important. There is, indeed, heterogeneity in the control variable levels (log_pop). Potentially important factors: the treatment_event should have a positive effect on the control log_pop (I want to isolate growth beyond this); the control cases are matched using pre-treatment trends.
My reading of the papers below left me with the impression that Wooldridge is more specifically concerned with unit heterogeneity. I understood more about the staggered timing heterogeneity in Callaway and Sant'Anna than I did about unit heterogeneity; however the quote below from Stata makes it seems like the latter is more likely to be a robust method, generally.
Under what circumstances should we expect differing results between the two methods? Based on my data and analysis, which should I expect to perform better?
Code in next comment:
A toy version (arbitrarily altered subset) of my data and models are below, and these behave like my real data: classic TWFE and DiD are fairly similar to AIPW (positive, significant) but distinct from Wooldridge's TWFE (positive, insignificant) accounting for heterogeneity when I use RHS controls (log_pop). They are more similar without controls, but they are important. There is, indeed, heterogeneity in the control variable levels (log_pop). Potentially important factors: the treatment_event should have a positive effect on the control log_pop (I want to isolate growth beyond this); the control cases are matched using pre-treatment trends.
My reading of the papers below left me with the impression that Wooldridge is more specifically concerned with unit heterogeneity. I understood more about the staggered timing heterogeneity in Callaway and Sant'Anna than I did about unit heterogeneity; however the quote below from Stata makes it seems like the latter is more likely to be a robust method, generally.
Under what circumstances should we expect differing results between the two methods? Based on my data and analysis, which should I expect to perform better?
xthdidregress provides four estimators: TWFE, outlined in Wooldridge (2021); RA, IPW, and
AIPW, outlined in Callaway and Sant’Anna (2021). ... For example, RA and
TWFE model the outcome; IPW models the treatment; and AIPW models both. If the model for the
outcome is correctly specified, RA and TWFE are best, with TWFE being more efficient. If the treatment
model is correctly specified, IPW should be best. AIPW models both treatment and outcome. If at
least one of the models is correctly specified, it provides consistent estimates.
AIPW, outlined in Callaway and Sant’Anna (2021). ... For example, RA and
TWFE model the outcome; IPW models the treatment; and AIPW models both. If the model for the
outcome is correctly specified, RA and TWFE are best, with TWFE being more efficient. If the treatment
model is correctly specified, IPW should be best. AIPW models both treatment and outcome. If at
least one of the models is correctly specified, it provides consistent estimates.
Comment