CSDID - Inflated ATT value for binary dependent variable

Tess Lallemant

Join Date: Jun 2023
Posts: 3

CSDID - Inflated ATT value for binary dependent variable

06 Jun 2023, 14:12

Hello Stata Community!

I'm still learning how to use the CSDID package based on Callaway and Sant’Anna's work and could use some help understanding results that I am finding puzzling.

Some background information:

I am using STATA 16. My data is a repeated cross-section with 3 periods: 1999, 2010, and 2015. My treatment is at the city level but my dependent variable is at the individual level and binary: individual i works for their family. I include covariates sex_i and age_i as well as city level controls.

Please find some summary stats below.

Code:

    Cross tabs of data years to treatment cohorts  
       |              Treatment Cohort
      Year |         0       1999       2010       2015 |     Total
-----------+--------------------------------------------+----------
      1999 |     1,408        340        728        122 |     2,598
      2010 |     1,639        302        672        102 |     2,715
      2015 |     1,958        371        899        188 |     3,416
-----------+--------------------------------------------+----------
     Total |     5,005      1,013      2,299        412 |     8,729

Code:

 Number treatment cities and individuals

Treated|    0     1999      2010      2015
-----------+--------------------------------------------
Cities|    47      4      6      4


Treated |           0         1999        2010       2015
-----------+--------------------------------------------
Individuals|     5,005        1,013        2,299        8,729

Code:

. Summary statistics

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         sex |      8,729    .1492725    .3563773          0          1<-  Individual is male
         age |      8,729    30.91064    9.427678         16         59<-  Individual age
CITY_lit_r~d |      8,729    .7215272    .1686842   .0714286   .9666667<-  City population literacy rate
CITY_ed_co~m |      8,729    .6297804    .1282199   .0974359   .8658537<-  City population primary school completion rate
CITY_ed_co~c |      8,729    .1556425    .1155121          0   .4203011<-  City population secondary school completion rate
CITY_ed_co~h |      8,729     .007751    .0115286          0   .0738095<-  City population higher education school completion rate
    workfam3 |      8,729    .5632948     .496006          0          1<- Outcome variable - Binary for individual working for family firm

I run the following code:

Code:

* CSDID Base controls + city education
csdid  workfam3 sex age  CITY_lit_read  CITY_ed_comp_prim CITY_ed_comp_sec CITY_ed_comp_high,  time(YearC) gvar(treat_cohort) method(dripw) cluster(citycode) notyet

estat event
estat simple

I get the following results.

Code:

Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      7,716
Outcome model  : least squares
Treatment model: inverse probability
                              (Std. Err. adjusted for 57 clusters in citycode)
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2010        |
 t_1999_2010 |   1.582202   .1620329     9.76   0.000     1.264624    1.899781
 t_1999_2015 |   6.557619   .8709219     7.53   0.000     4.850644    8.264595
-------------+----------------------------------------------------------------
g2015        |
 t_1999_2010 |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
 t_2010_2015 |   .0320314   .1881829     0.17   0.865    -.3368004    .4008632
------------------------------------------------------------------------------
Control: Not yet Treated

See Callaway and Sant'Anna (2021) for details

.
. estat event
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Pre_avg |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
    Post_avg |   3.936908   .4743211     8.30   0.000     3.007256     4.86656
         Tm5 |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
         Tp0 |   1.316196   .2952748     4.46   0.000     .7374685    1.894924
         Tp5 |   6.557619   .8709219     7.53   0.000     4.850644    8.264595
------------------------------------------------------------------------------

. estat simple
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |   3.887133   .5381026     7.22   0.000     2.832471    4.941794
------------------------------------------------------------------------------

.

Given that my outcome variable is binary, I am confused as to why my ATT would be larger than 1. Am I misunderstanding something about how each ATT_g is calculated?

Thank you in advance for any guidance on this issue.

Tags: None

FernandoRios

Join Date: Apr 2014

Posts: 2430
#2

06 Jun 2023, 15:14

I think the problem May be related to extrapolation
Can you drop all controls and do the same?
and if you ise
method reg
hth
Comment

Tess Lallemant

Join Date: Jun 2023
Posts: 3

07 Jun 2023, 09:14

Thanks for your quick response.

Below are the results with dripw without any controls.

Code:

Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      7,716
Outcome model  : least squares
Treatment model: inverse probability
                              (Std. Err. adjusted for 57 clusters in citycode)
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2010        |
 t_1999_2010 |  -.0526326   .0660829    -0.80   0.426    -.1821528    .0768876
 t_1999_2015 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
-------------+----------------------------------------------------------------
g2015        |
 t_1999_2010 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
 t_2010_2015 |   .1399004     .13578     1.03   0.303    -.1262235    .4060243
------------------------------------------------------------------------------
Control: Not yet Treated

See Callaway and Sant'Anna (2021) for details

.
. estat event
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Pre_avg |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
    Post_avg |  -.1185946    .062592    -1.89   0.058    -.2412728    .0040835
         Tm5 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
         Tp0 |  -.0195944   .0611838    -0.32   0.749    -.1395125    .1003237
         Tp5 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
------------------------------------------------------------------------------

. estat simple
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |  -.1167143   .0632741    -1.84   0.065    -.2407292    .0073006
------------------------------------------------------------------------------

And here the results with estimator: reg with covariates.

Code:

Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      7,716
Outcome model  : regression adjustment
Treatment model: none
                              (Std. Err. adjusted for 57 clusters in citycode)
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2010        |
 t_1999_2010 |   1.691072   .1834538     9.22   0.000     1.331509    2.050635
 t_1999_2015 |   6.801144   .8735401     7.79   0.000     5.089037    8.513251
-------------+----------------------------------------------------------------
g2015        |
 t_1999_2010 |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
 t_2010_2015 |    .187527    .153718     1.22   0.222    -.1137548    .4888088
------------------------------------------------------------------------------
Control: Not yet Treated

See Callaway and Sant'Anna (2021) for details

.
. estat event
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Pre_avg |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
    Post_avg |   4.117106   .4924739     8.36   0.000     3.151875    5.082337
         Tm5 |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
         Tp0 |   1.433067   .3065413     4.67   0.000     .8322576    2.033877
         Tp5 |   6.801144   .8735401     7.79   0.000     5.089037    8.513251
------------------------------------------------------------------------------

. estat simple
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |   4.066128   .5589138     7.28   0.000     2.970677    5.161579
------------------------------------------------------------------------------

And here are the results using method reg without covariates.

Code:

. csdid  workfam3  ,  time(YearC) gvar(treat_cohort) method(reg) cluster(citycode) notyet
Units always treated found. These will be excluded
....
Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      7,716
Outcome model  : regression adjustment
Treatment model: none
                              (Std. Err. adjusted for 57 clusters in citycode)
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2010        |
 t_1999_2010 |  -.0526326   .0660829    -0.80   0.426    -.1821528    .0768876
 t_1999_2015 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
-------------+----------------------------------------------------------------
g2015        |
 t_1999_2010 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
 t_2010_2015 |   .1399004     .13578     1.03   0.303    -.1262235    .4060243
------------------------------------------------------------------------------
Control: Not yet Treated

See Callaway and Sant'Anna (2021) for details

.
. estat event
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Pre_avg |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
    Post_avg |  -.1185946    .062592    -1.89   0.058    -.2412728    .0040835
         Tm5 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
         Tp0 |  -.0195944   .0611838    -0.32   0.749    -.1395125    .1003237
         Tp5 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
------------------------------------------------------------------------------

. estat simple
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |  -.1167143   .0632741    -1.84   0.065    -.2407292    .0073006
------------------------------------------------------------------------------

It looks like the covariates are causing this somehow.

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2430
#4

07 Jun 2023, 09:48

ok so it is a problem of reweighting and extrapolation.
I would also suggest using drimp. It is often a bit better.
Other than that, you may want to explore how the 2000 cohort and the never treated. I suspects the balance is poor
F
Comment

Tess Lallemant

Join Date: Jun 2023
Posts: 3

07 Jun 2023, 09:58

Using drimp makes infaltes the results even more. (See below). I suspect for the reasons of balance you suggested.

Code:

 csdid  workfam3 sex age  CITY_lit_read  CITY_ed_comp_prim CITY_ed_comp_sec CITY_ed_comp_high,  time(Yea
> rC) gvar(treat_cohort) method(drimp) cluster(citycode) notyet
Units always treated found. These will be excluded
...x
Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      7,528
Outcome model  : weighted least squares
Treatment model: inverse probability tilting
                              (Std. Err. adjusted for 57 clusters in citycode)
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
g2010        |
 t_1999_2010 |  -24.04347   3.419732    -7.03   0.000    -30.74603   -17.34092
 t_1999_2015 |  -99.55414   13.18768    -7.55   0.000    -125.4015   -73.70676
-------------+----------------------------------------------------------------
g2015        |
 t_1999_2010 |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
 t_2010_2015 |          0  (omitted)
------------------------------------------------------------------------------
Control: Not yet Treated

See Callaway and Sant'Anna (2021) for details

.
. estat event
ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     Pre_avg |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
    Post_avg |  -61.79881   8.100714    -7.63   0.000    -77.67592    -45.9217
         Tm5 |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
         Tp0 |  -24.04347   3.419732    -7.03   0.000    -30.74603   -17.34092
         Tp5 |  -99.55414   13.18768    -7.55   0.000    -125.4015   -73.70676
------------------------------------------------------------------------------

. estat simple
Average Treatment Effect on Treated
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |  -46.97896   8.030837    -5.85   0.000    -62.71911   -31.23881
------------------------------------------------------------------------------

Announcement

CSDID - Inflated ATT value for binary dependent variable

Comment

Comment

Comment

Comment