Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • csdid using only notyet treated groups as the comparison group

    Hi, I am using the csdid command.
    Specifically, my data structure is like below:
    Code:
    . tab Gvar
    
        Gvar   |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |     10,077       27.93       27.93
              1 |      2,626        7.28       35.21
              2 |      2,929        8.12       43.32
              3 |      2,759        7.65       50.97
              4 |      3,141        8.71       59.68
              5 |      2,243        6.22       65.89
              6 |      2,379        6.59       72.48
              7 |      2,319        6.43       78.91
              8 |      1,944        5.39       84.30
              9 |      1,357        3.76       88.06
             10 |        989        2.74       90.80
             11 |        745        2.06       92.87
             12 |        669        1.85       94.72
             13 |        507        1.41       96.13
             14 |        513        1.42       97.55
             15 |        885        2.45      100.00
    ------------+-----------------------------------
          Total |     36,082      100.00
    
    . tab time
    
                   |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |      1,699        4.71        4.71
              2 |      2,422        6.71       11.42
              3 |      2,642        7.32       18.74
              4 |      2,745        7.61       26.35
              5 |      2,767        7.67       34.02
              6 |      2,695        7.47       41.49
              7 |      2,652        7.35       48.84
              8 |      2,538        7.03       55.87
              9 |      2,501        6.93       62.80
             10 |      2,346        6.50       69.31
             11 |      2,276        6.31       75.61
             12 |      2,082        5.77       81.38
             13 |      1,999        5.54       86.92
             14 |      1,690        4.68       91.61
             15 |      1,638        4.54       96.15
             16 |      1,390        3.85      100.00
    ------------+-----------------------------------
          Total |     36,082      100.00
    That is, there are 16 time periods, and the group variable includes never-treated (g = 0), always-treated (g = 1), and ever-treated (g = 2, ..., 15) groups.

    Obviously, without the never-treated group, ATT(g, 15), ATT(g, 16), and ATT(15, t) are not estimable, and the following result is consistent with my understanding:
    Code:
    . csdid Y if Gvar != 0, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Units always treated found. These will be ignored
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .............xx.............xx.............xx.....
    ........xx.............xx.............xx..........
    ...xx.............xx.............xx.............xx
    .............xx.............xx.............xxxxxxx
    xxxxxxxxxx
    Difference-in-difference with Multiple Time Periods
    The x marks indicate ATT(g, 15), ATT(g, 16), and ATT(15, t).

    But, in this setup, when I add the control variables, the csdid command fails to estimate most of the ATT(g, t):
    Code:
    . csdid Y $X if Gvar != 0, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .....xxxxxxxxxx.....xxxxxxxxxx.....xxxxxxxxxx.....
    x.xxxxxxxx.....xxxxxxxxxx.......xxxx..xxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    xxxxxxxxxx
    Difference-in-difference with Multiple Time Periods
    The global $X is the names of the (time-invariant) control variables.

    If I include the never-treated group in the comparison group, the result becomes
    Code:
    . csdid Y $X, ivar(pid) time(time) gvar(Gvar) notyet long2
    
    Units always treated found. These will be ignored
    Panel is not balanced
    Will use observations with Pair balanced (observed at t0 and t1)
    .............................x....................
    ..................................................
    .................................................x
    ..............x..............x..............x.....
    ........xx
    Difference-in-difference with Multiple Time Periods
    What is the problem in the second result...??

  • #2
    The problem is sample size
    if you only use notyet treated your sample becomes very small, and the regression done in the background is no longer feasible

    Comment


    • #3
      FernandoRios Thank you for the quick answer. As you said, in my data, there are quite small number of ever-treated (or not-yet-treated) observations. Thank you.

      Comment

      Working...
      X