Hello Stata Community!
I'm still learning how to use the CSDID package based on Callaway and Sant’Anna's work and could use some help understanding results that I am finding puzzling.
Some background information:
I am using STATA 16. My data is a repeated cross-section with 3 periods: 1999, 2010, and 2015. My treatment is at the city level but my dependent variable is at the individual level and binary: individual i works for their family. I include covariates sex_i and age_i as well as city level controls.
Please find some summary stats below.
I run the following code:
I get the following results.
Given that my outcome variable is binary, I am confused as to why my ATT would be larger than 1. Am I misunderstanding something about how each ATT_g is calculated?
Thank you in advance for any guidance on this issue.
I'm still learning how to use the CSDID package based on Callaway and Sant’Anna's work and could use some help understanding results that I am finding puzzling.
Some background information:
I am using STATA 16. My data is a repeated cross-section with 3 periods: 1999, 2010, and 2015. My treatment is at the city level but my dependent variable is at the individual level and binary: individual i works for their family. I include covariates sex_i and age_i as well as city level controls.
Please find some summary stats below.
Code:
Cross tabs of data years to treatment cohorts | Treatment Cohort Year | 0 1999 2010 2015 | Total -----------+--------------------------------------------+---------- 1999 | 1,408 340 728 122 | 2,598 2010 | 1,639 302 672 102 | 2,715 2015 | 1,958 371 899 188 | 3,416 -----------+--------------------------------------------+---------- Total | 5,005 1,013 2,299 412 | 8,729
Code:
Number treatment cities and individuals Treated| 0 1999 2010 2015 -----------+-------------------------------------------- Cities| 47 4 6 4 Treated | 0 1999 2010 2015 -----------+-------------------------------------------- Individuals| 5,005 1,013 2,299 8,729
Code:
. Summary statistics Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- sex | 8,729 .1492725 .3563773 0 1<- Individual is male age | 8,729 30.91064 9.427678 16 59<- Individual age CITY_lit_r~d | 8,729 .7215272 .1686842 .0714286 .9666667<- City population literacy rate CITY_ed_co~m | 8,729 .6297804 .1282199 .0974359 .8658537<- City population primary school completion rate CITY_ed_co~c | 8,729 .1556425 .1155121 0 .4203011<- City population secondary school completion rate CITY_ed_co~h | 8,729 .007751 .0115286 0 .0738095<- City population higher education school completion rate workfam3 | 8,729 .5632948 .496006 0 1<- Outcome variable - Binary for individual working for family firm
I run the following code:
Code:
* CSDID Base controls + city education csdid workfam3 sex age CITY_lit_read CITY_ed_comp_prim CITY_ed_comp_sec CITY_ed_comp_high, time(YearC) gvar(treat_cohort) method(dripw) cluster(citycode) notyet estat event estat simple
I get the following results.
Code:
Difference-in-difference with Multiple Time Periods Number of obs = 7,716 Outcome model : least squares Treatment model: inverse probability (Std. Err. adjusted for 57 clusters in citycode) ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- g2010 | t_1999_2010 | 1.582202 .1620329 9.76 0.000 1.264624 1.899781 t_1999_2015 | 6.557619 .8709219 7.53 0.000 4.850644 8.264595 -------------+---------------------------------------------------------------- g2015 | t_1999_2010 | .5149717 .2743876 1.88 0.061 -.0228181 1.052762 t_2010_2015 | .0320314 .1881829 0.17 0.865 -.3368004 .4008632 ------------------------------------------------------------------------------ Control: Not yet Treated See Callaway and Sant'Anna (2021) for details . . estat event ATT by Periods Before and After treatment Event Study:Dynamic effects ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Pre_avg | .5149717 .2743876 1.88 0.061 -.0228181 1.052762 Post_avg | 3.936908 .4743211 8.30 0.000 3.007256 4.86656 Tm5 | .5149717 .2743876 1.88 0.061 -.0228181 1.052762 Tp0 | 1.316196 .2952748 4.46 0.000 .7374685 1.894924 Tp5 | 6.557619 .8709219 7.53 0.000 4.850644 8.264595 ------------------------------------------------------------------------------ . estat simple Average Treatment Effect on Treated ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATT | 3.887133 .5381026 7.22 0.000 2.832471 4.941794 ------------------------------------------------------------------------------ .
Given that my outcome variable is binary, I am confused as to why my ATT would be larger than 1. Am I misunderstanding something about how each ATT_g is calculated?
Thank you in advance for any guidance on this issue.
Comment