Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • csdid using only notyet treated groups as the comparison group

    Hi, I am using the csdid command.
    Specifically, my data structure is like below:
    Code:
    . tab Gvar
    
        Gvar   |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |     10,077       27.93       27.93
              1 |      2,626        7.28       35.21
              2 |      2,929        8.12       43.32
              3 |      2,759        7.65       50.97
              4 |      3,141        8.71       59.68
              5 |      2,243        6.22       65.89
              6 |      2,379        6.59       72.48
              7 |      2,319        6.43       78.91
              8 |      1,944        5.39       84.30
              9 |      1,357        3.76       88.06
             10 |        989        2.74       90.80
             11 |        745        2.06       92.87
             12 |        669        1.85       94.72
             13 |        507        1.41       96.13
             14 |        513        1.42       97.55
             15 |        885        2.45      100.00
    ------------+-----------------------------------
          Total |     36,082      100.00
    
    . tab time
    
                   |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |      1,699        4.71        4.71
              2 |      2,422        6.71       11.42
              3 |      2,642        7.32       18.74
              4 |      2,745        7.61       26.35
              5 |      2,767        7.67       34.02
              6 |      2,695        7.47       41.49
              7 |      2,652        7.35       48.84
              8 |      2,538        7.03       55.87
              9 |      2,501        6.93       62.80
             10 |      2,346        6.50       69.31
             11 |      2,276        6.31       75.61
             12 |      2,082        5.77       81.38
             13 |      1,999        5.54       86.92
             14 |      1,690        4.68       91.61
             15 |      1,638        4.54       96.15
             16 |      1,390        3.85      100.00
    ------------+-----------------------------------
          Total |     36,082      100.00
    That is, there are 16 time periods, and the group variable includes never-treated (g = 0), always-treated (g = 1), and ever-treated (g = 2, ..., 15) groups.

    Obviously, without the never-treated group, ATT(g, 15), ATT(g, 16), and ATT(15, t) are not estimable, and the following result is consistent with my understanding:
    Code:
    . csdid Y if Gvar != 0, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Units always treated found. These will be ignored
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .............xx.............xx.............xx.....
    ........xx.............xx.............xx..........
    ...xx.............xx.............xx.............xx
    .............xx.............xx.............xxxxxxx
    xxxxxxxxxx
    Difference-in-difference with Multiple Time Periods
    The x marks indicate ATT(g, 15), ATT(g, 16), and ATT(15, t).

    But, in this setup, when I add the control variables, the csdid command fails to estimate most of the ATT(g, t):
    Code:
    . csdid Y $X if Gvar != 0, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .....xxxxxxxxxx.....xxxxxxxxxx.....xxxxxxxxxx.....
    x.xxxxxxxx.....xxxxxxxxxx.......xxxx..xxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxx
    Difference-in-difference with Multiple Time Periods
    The global $X is the names of the (time-invariant) control variables.

    If I include the never-treated group in the comparison group, the result becomes
    Code:
    . csdid Y $X, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Units always treated found. These will be ignored
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .............................x....................
    ..................................................
    .................................................x
    ..............x..............x..............x.....
    ........xx
    Difference-in-difference with Multiple Time Periods
    What is the problem in the second result...??

  • #2
    The problem is sample size
    if you only use notyet treated your sample becomes very small, and the regression done in the background is no longer feasible

    Comment


    • #3
      FernandoRios Thank you for the quick answer. As you said, in my data, there are quite small number of ever-treated (or not-yet-treated) observations. Thank you.

      Comment


      • #4
        @FernandoRios
        I am confused with one thing....if always treated are ignored as Padro suggest in his video, then using later treated or never treated are compared with what....i thought when using "notyet" code, the anlysis is considering later treated as well as never treated cohorts as control group...further when covariates are added values of t-1 are used for comparison so what is compared with what if Always Treated are not in the analysis ....
        Last edited by Awais Farid Khan; 27 Aug 2025, 18:02.

        Comment


        • #5
          Always treated are different from not yet treated

          Always treated are treated through the whole windows of time you have access to

          Comment


          • #6
            @FernandoRios..Thankyou for your reply....when using notyet command....calculations are based on Later treated cohorts only ....where treated cohorts are compared with those which are are not treated at particular time. eg i have 30 units (not yet treated) from 2019 to 2025...15 were treated on 2020....10 were treated on 2022 and 5 treated on 2024.....in this case 2020 cohorts will be compared with 2022 & 2024 cohorts......2022 cohorts will be compared with 2024 cohorts....what is the comparison group for 2024 cohorts?......does theses comparisons include never treated+not yet treated cohorts for that period?.....You reply will be much appreciated. I am trying to interpret my results but stuck on these questions...

            Comment


            • #7
              Never trested are by definition not yet treated
              that's why in notation is used as g=infinity

              Comment


              • #8
                @FernandoRios..Thankyou for your reply...i am still not clear with the difference of comparison groups with using "not yet" command and without using it....i will research some more....thanks for your guidance...

                Comment


                • #9
                  @FernandoRios... I apologize to contact you again as I am stuck in understanding the difference conceptually.... following are the results without and with using command notyet.....since i am working on the impact of a regulation so it is very important to understand the distinction for interpretation....my understanding is with notyet command, software uses cohorts in comparison group till they gets treated and keeping non-switchers in controls...but without notyet command, software only chooses those cohorts as controls who did not switch...is my understanding correct?

                  my second question is ...my last later treated is treated in 2018 and in both cases the coefficient is same so it means when there is no later treated cohort to compare..the software uses never treated as comparison group in both cases with or without notyet command....is my understanding correct?


                  ------------------------------------------------------------------------------
                  | Coefficient Std. err. z P>|z| [95% conf. interval]
                  -------------+----------------------------------------------------------------
                  ATT | 6.022243 2.153575 2.80 0.005 1.801313 10.24317
                  ------------------------------------------------------------------------------
                  Control: Never Treated
                  /
                  /
                  /

                  ------------------------------------------------------------------------------
                  | Coefficient Std. err. z P>|z| [95% conf. interval]
                  -------------+----------------------------------------------------------------
                  ATT | 5.280511 1.908185 2.77 0.006 1.540538 9.020485
                  ------------------------------------------------------------------------------
                  Control: Not yet Treated

                  Comment

                  Working...
                  X