Hi!
I have a panel data set of municipalities between 1998-2021, where I want to estimate the effect of an intervention that happened at different times in some of the municipalities (i.e. the treatment is staggered). Here I employ the approach by Callaway and Sant’Anna (2021) and the csdid package, which have worked great.
However, I have also been asked to employ the “regular” two way fixed effect-model, which does not work so well. I have run the TWFE-regression with leads and lags built on the notion that if the trend is parallel, the leads should not be statistically different from zero. However, one of them is. Furthermore, the overall model (i.e. running the TWFE-model without leads and lags) shows no significant signs of treatment.
I suspect that the lack of treatment effect could be because the municipalities that are selected for treatment significantly differ from the ones that are not selected for treatment. Hence, I would like to implement some kind of matching procedure, to ensure that the treated and control group are more similar.
However, I am not sure how to do this in a panel data setting.
My steps this far involve:
1. Run a logit regression where the outcome is ever being treated (1999-2021), and where the other variables are based on pre-treatment characteristics (i.e. the year 1998) that are believed to affect assignment into treatment.
2. Then propensity scores are assigned.
3. I then assign all of the observations within the same municipality to the same pscore.
4. Then I use psmatch2 to achieve weights, which I incorporate in the TWFE-regression.
5. I then incorporate leads and lags of treatment to see if the difference in pre-trends is significant between treated and untreated observations.
Code looks like this:
logit ever_treated matching variables if year==1998
predict pscore if year==1998
by municipalcode (year), sort: replace pscore = pscore[_n-1] if pscore >= .
psmatch2 ever_treat, pscore(pscore) outcome(depvar) neighbor(20)
xtreg depvar indepvar i.year [aweight=_weight], fe vce(cluster municipality)
xtreg depvar lead_5 lead_4 lead_3 lead_2 lead_1 lag0 lag1 lag2 lag3 lag4 lag5 i.year [aweight=_weight], fe vce(cluster municipality)
However, the results turn out almost identical to the unmatched setting.
My questions thus are:
1. How should one do an appropriate matching in the panel data setting with staggered treatment?
2. Is it working to use leads and lags with weights or should I show parallel trends in some other way with the matched sample? How?
I have a panel data set of municipalities between 1998-2021, where I want to estimate the effect of an intervention that happened at different times in some of the municipalities (i.e. the treatment is staggered). Here I employ the approach by Callaway and Sant’Anna (2021) and the csdid package, which have worked great.
However, I have also been asked to employ the “regular” two way fixed effect-model, which does not work so well. I have run the TWFE-regression with leads and lags built on the notion that if the trend is parallel, the leads should not be statistically different from zero. However, one of them is. Furthermore, the overall model (i.e. running the TWFE-model without leads and lags) shows no significant signs of treatment.
I suspect that the lack of treatment effect could be because the municipalities that are selected for treatment significantly differ from the ones that are not selected for treatment. Hence, I would like to implement some kind of matching procedure, to ensure that the treated and control group are more similar.
However, I am not sure how to do this in a panel data setting.
My steps this far involve:
1. Run a logit regression where the outcome is ever being treated (1999-2021), and where the other variables are based on pre-treatment characteristics (i.e. the year 1998) that are believed to affect assignment into treatment.
2. Then propensity scores are assigned.
3. I then assign all of the observations within the same municipality to the same pscore.
4. Then I use psmatch2 to achieve weights, which I incorporate in the TWFE-regression.
5. I then incorporate leads and lags of treatment to see if the difference in pre-trends is significant between treated and untreated observations.
Code looks like this:
logit ever_treated matching variables if year==1998
predict pscore if year==1998
by municipalcode (year), sort: replace pscore = pscore[_n-1] if pscore >= .
psmatch2 ever_treat, pscore(pscore) outcome(depvar) neighbor(20)
xtreg depvar indepvar i.year [aweight=_weight], fe vce(cluster municipality)
xtreg depvar lead_5 lead_4 lead_3 lead_2 lead_1 lag0 lag1 lag2 lag3 lag4 lag5 i.year [aweight=_weight], fe vce(cluster municipality)
However, the results turn out almost identical to the unmatched setting.
My questions thus are:
1. How should one do an appropriate matching in the panel data setting with staggered treatment?
2. Is it working to use leads and lags with weights or should I show parallel trends in some other way with the matched sample? How?
Comment