Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CSDID - Inflated ATT value for binary dependent variable

    Hello Stata Community!

    I'm still learning how to use the CSDID package based on Callaway and Sant’Anna's work and could use some help understanding results that I am finding puzzling.

    Some background information:

    I am using STATA 16. My data is a repeated cross-section with 3 periods: 1999, 2010, and 2015. My treatment is at the city level but my dependent variable is at the individual level and binary: individual i works for their family. I include covariates sex_i and age_i as well as city level controls.

    Please find some summary stats below.


    Code:
        Cross tabs of data years to treatment cohorts  
           |              Treatment Cohort
          Year |         0       1999       2010       2015 |     Total
    -----------+--------------------------------------------+----------
          1999 |     1,408        340        728        122 |     2,598
          2010 |     1,639        302        672        102 |     2,715
          2015 |     1,958        371        899        188 |     3,416
    -----------+--------------------------------------------+----------
         Total |     5,005      1,013      2,299        412 |     8,729
    Code:
     Number treatment cities and individuals
    
    Treated|    0     1999      2010      2015
    -----------+--------------------------------------------
    Cities|    47      4      6      4
    
    
    Treated |           0         1999        2010       2015
    -----------+--------------------------------------------
    Individuals|     5,005        1,013        2,299        8,729

    Code:
    . Summary statistics
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
             sex |      8,729    .1492725    .3563773          0          1<-  Individual is male
             age |      8,729    30.91064    9.427678         16         59<-  Individual age
    CITY_lit_r~d |      8,729    .7215272    .1686842   .0714286   .9666667<-  City population literacy rate
    CITY_ed_co~m |      8,729    .6297804    .1282199   .0974359   .8658537<-  City population primary school completion rate
    CITY_ed_co~c |      8,729    .1556425    .1155121          0   .4203011<-  City population secondary school completion rate
    CITY_ed_co~h |      8,729     .007751    .0115286          0   .0738095<-  City population higher education school completion rate
        workfam3 |      8,729    .5632948     .496006          0          1<- Outcome variable - Binary for individual working for family firm

    I run the following code:

    Code:
    * CSDID Base controls + city education
    csdid  workfam3 sex age  CITY_lit_read  CITY_ed_comp_prim CITY_ed_comp_sec CITY_ed_comp_high,  time(YearC) gvar(treat_cohort) method(dripw) cluster(citycode) notyet
    
    estat event
    estat simple

    I get the following results.


    Code:
    Difference-in-difference with Multiple Time Periods
    
                                                    Number of obs     =      7,716
    Outcome model  : least squares
    Treatment model: inverse probability
                                  (Std. Err. adjusted for 57 clusters in citycode)
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    g2010        |
     t_1999_2010 |   1.582202   .1620329     9.76   0.000     1.264624    1.899781
     t_1999_2015 |   6.557619   .8709219     7.53   0.000     4.850644    8.264595
    -------------+----------------------------------------------------------------
    g2015        |
     t_1999_2010 |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
     t_2010_2015 |   .0320314   .1881829     0.17   0.865    -.3368004    .4008632
    ------------------------------------------------------------------------------
    Control: Not yet Treated
    
    See Callaway and Sant'Anna (2021) for details
    
    .
    . estat event
    ATT by Periods Before and After treatment
    Event Study:Dynamic effects
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         Pre_avg |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
        Post_avg |   3.936908   .4743211     8.30   0.000     3.007256     4.86656
             Tm5 |   .5149717   .2743876     1.88   0.061    -.0228181    1.052762
             Tp0 |   1.316196   .2952748     4.46   0.000     .7374685    1.894924
             Tp5 |   6.557619   .8709219     7.53   0.000     4.850644    8.264595
    ------------------------------------------------------------------------------
    
    . estat simple
    Average Treatment Effect on Treated
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             ATT |   3.887133   .5381026     7.22   0.000     2.832471    4.941794
    ------------------------------------------------------------------------------
    
    .

    Given that my outcome variable is binary, I am confused as to why my ATT would be larger than 1. Am I misunderstanding something about how each ATT_g is calculated?

    Thank you in advance for any guidance on this issue.



  • #2
    I think the problem May be related to extrapolation
    Can you drop all controls and do the same?
    and if you ise
    method reg
    hth

    Comment


    • #3
      Thanks for your quick response.

      Below are the results with dripw without any controls.

      Code:
      Difference-in-difference with Multiple Time Periods
      
                                                      Number of obs     =      7,716
      Outcome model  : least squares
      Treatment model: inverse probability
                                    (Std. Err. adjusted for 57 clusters in citycode)
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      g2010        |
       t_1999_2010 |  -.0526326   .0660829    -0.80   0.426    -.1821528    .0768876
       t_1999_2015 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
      -------------+----------------------------------------------------------------
      g2015        |
       t_1999_2010 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
       t_2010_2015 |   .1399004     .13578     1.03   0.303    -.1262235    .4060243
      ------------------------------------------------------------------------------
      Control: Not yet Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      .
      . estat event
      ATT by Periods Before and After treatment
      Event Study:Dynamic effects
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           Pre_avg |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
          Post_avg |  -.1185946    .062592    -1.89   0.058    -.2412728    .0040835
               Tm5 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
               Tp0 |  -.0195944   .0611838    -0.32   0.749    -.1395125    .1003237
               Tp5 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
      ------------------------------------------------------------------------------
      
      . estat simple
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               ATT |  -.1167143   .0632741    -1.84   0.065    -.2407292    .0073006
      ------------------------------------------------------------------------------

      And here the results with estimator: reg with covariates.

      Code:
      Difference-in-difference with Multiple Time Periods
      
                                                      Number of obs     =      7,716
      Outcome model  : regression adjustment
      Treatment model: none
                                    (Std. Err. adjusted for 57 clusters in citycode)
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      g2010        |
       t_1999_2010 |   1.691072   .1834538     9.22   0.000     1.331509    2.050635
       t_1999_2015 |   6.801144   .8735401     7.79   0.000     5.089037    8.513251
      -------------+----------------------------------------------------------------
      g2015        |
       t_1999_2010 |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
       t_2010_2015 |    .187527    .153718     1.22   0.222    -.1137548    .4888088
      ------------------------------------------------------------------------------
      Control: Not yet Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      .
      . estat event
      ATT by Periods Before and After treatment
      Event Study:Dynamic effects
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           Pre_avg |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
          Post_avg |   4.117106   .4924739     8.36   0.000     3.151875    5.082337
               Tm5 |   .5063845   .2866188     1.77   0.077     -.055378    1.068147
               Tp0 |   1.433067   .3065413     4.67   0.000     .8322576    2.033877
               Tp5 |   6.801144   .8735401     7.79   0.000     5.089037    8.513251
      ------------------------------------------------------------------------------
      
      . estat simple
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               ATT |   4.066128   .5589138     7.28   0.000     2.970677    5.161579
      ------------------------------------------------------------------------------
      And here are the results using method reg without covariates.

      Code:
      . csdid  workfam3  ,  time(YearC) gvar(treat_cohort) method(reg) cluster(citycode) notyet
      Units always treated found. These will be excluded
      ....
      Difference-in-difference with Multiple Time Periods
      
                                                      Number of obs     =      7,716
      Outcome model  : regression adjustment
      Treatment model: none
                                    (Std. Err. adjusted for 57 clusters in citycode)
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      g2010        |
       t_1999_2010 |  -.0526326   .0660829    -0.80   0.426    -.1821528    .0768876
       t_1999_2015 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
      -------------+----------------------------------------------------------------
      g2015        |
       t_1999_2010 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
       t_2010_2015 |   .1399004     .13578     1.03   0.303    -.1262235    .4060243
      ------------------------------------------------------------------------------
      Control: Not yet Treated
      
      See Callaway and Sant'Anna (2021) for details
      
      .
      . estat event
      ATT by Periods Before and After treatment
      Event Study:Dynamic effects
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
           Pre_avg |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
          Post_avg |  -.1185946    .062592    -1.89   0.058    -.2412728    .0040835
               Tm5 |  -.1130862   .1378586    -0.82   0.412    -.3832841    .1571118
               Tp0 |  -.0195944   .0611838    -0.32   0.749    -.1395125    .1003237
               Tp5 |  -.2175948   .0755872    -2.88   0.004    -.3657431   -.0694466
      ------------------------------------------------------------------------------
      
      . estat simple
      Average Treatment Effect on Treated
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               ATT |  -.1167143   .0632741    -1.84   0.065    -.2407292    .0073006
      ------------------------------------------------------------------------------

      It looks like the covariates are causing this somehow.

      Comment


      • #4
        ok so it is a problem of reweighting and extrapolation.
        I would also suggest using drimp. It is often a bit better.
        Other than that, you may want to explore how the 2000 cohort and the never treated. I suspects the balance is poor
        F

        Comment


        • #5
          Using drimp makes infaltes the results even more. (See below). I suspect for the reasons of balance you suggested.

          Code:
           csdid  workfam3 sex age  CITY_lit_read  CITY_ed_comp_prim CITY_ed_comp_sec CITY_ed_comp_high,  time(Yea
          > rC) gvar(treat_cohort) method(drimp) cluster(citycode) notyet
          Units always treated found. These will be excluded
          ...x
          Difference-in-difference with Multiple Time Periods
          
                                                          Number of obs     =      7,528
          Outcome model  : weighted least squares
          Treatment model: inverse probability tilting
                                        (Std. Err. adjusted for 57 clusters in citycode)
          ------------------------------------------------------------------------------
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          g2010        |
           t_1999_2010 |  -24.04347   3.419732    -7.03   0.000    -30.74603   -17.34092
           t_1999_2015 |  -99.55414   13.18768    -7.55   0.000    -125.4015   -73.70676
          -------------+----------------------------------------------------------------
          g2015        |
           t_1999_2010 |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
           t_2010_2015 |          0  (omitted)
          ------------------------------------------------------------------------------
          Control: Not yet Treated
          
          See Callaway and Sant'Anna (2021) for details
          
          .
          . estat event
          ATT by Periods Before and After treatment
          Event Study:Dynamic effects
          ------------------------------------------------------------------------------
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               Pre_avg |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
              Post_avg |  -61.79881   8.100714    -7.63   0.000    -77.67592    -45.9217
                   Tm5 |   .4437464   .2649321     1.67   0.094    -.0755109    .9630037
                   Tp0 |  -24.04347   3.419732    -7.03   0.000    -30.74603   -17.34092
                   Tp5 |  -99.55414   13.18768    -7.55   0.000    -125.4015   -73.70676
          ------------------------------------------------------------------------------
          
          . estat simple
          Average Treatment Effect on Treated
          ------------------------------------------------------------------------------
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   ATT |  -46.97896   8.030837    -5.85   0.000    -62.71911   -31.23881
          ------------------------------------------------------------------------------

          Comment

          Working...
          X