Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two categorical values omitted because of collinearity using difference in differences

    Hello, all Statalists!
    I want to measure the effect of rail station opening in certain city on the average wage of the citizens of this city.

    In particular, my dataset is a panel data for years 2004-2018.
    I explore the opening of rail stations in cities (874, 7700, 9200) on the same day in October 2016.

    Thus,
    Code:
    * define treatment and control groups
    gen treat = 0
    replace treat = 1 if (code == 9200) | (code == 7700) | (code == 874)
    
    * define post treatment period
    gen post = (year>=2017)
    
    * define interaction term
    gen treatXpost = treat*post

    I constructed a very simple difference in differences model using city fixed-effects:
    Click image for larger version

Name:	מסך 2022-04-26 144131.png
Views:	1
Size:	6.5 KB
ID:	1661579



    Here is a subset of my dataset:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(city_code year) float(treat post treatXpost lwage)
      70 2004 0 0 0  8.533657
      70 2005 0 0 0  8.631414
      70 2006 0 0 0  8.483016
      70 2007 0 0 0  8.698013
      70 2008 0 0 0  8.740017
      70 2009 0 0 0  8.751474
      70 2010 0 0 0  8.804475
      70 2011 0 0 0  8.856803
      70 2012 0 0 0  8.868977
      70 2013 0 0 0  8.919186
      70 2014 0 0 0  8.940891
      70 2015 0 0 0  8.986822
      70 2016 0 0 0  8.995661
      70 2017 0 1 0  9.055323
      70 2018 0 1 0  9.083416
     681 2004 0 0 0  9.053803
     681 2005 0 0 0  9.129347
     681 2006 0 0 0  9.109082
     681 2007 0 0 0  9.180603
     681 2008 0 0 0  9.203316
     681 2009 0 0 0  9.230437
     681 2010 0 0 0  9.308193
     681 2011 0 0 0  9.353747
     681 2012 0 0 0  9.350189
     681 2013 0 0 0  9.359019
     681 2014 0 0 0  9.380083
     681 2015 0 0 0  9.431081
     681 2016 0 0 0  9.435082
     681 2017 0 1 0  9.481436
     681 2018 0 1 0  9.494165
     874 2004 1 0 0  8.462525
     874 2005 1 0 0  8.474703
     874 2006 1 0 0  8.467373
     874 2007 1 0 0  8.560444
     874 2008 1 0 0   8.61214
     874 2009 1 0 0  8.570734
     874 2010 1 0 0  8.630343
     874 2011 1 0 0  8.691818
     874 2012 1 0 0  8.723882
     874 2013 1 0 0  8.748781
     874 2014 1 0 0   8.77323
     874 2015 1 0 0   8.84376
     874 2016 1 0 0  8.853379
     874 2017 1 1 1  8.923191
     874 2018 1 1 1  8.947025
    6800 2004 0 0 0  8.624252
    6800 2005 0 0 0   8.65504
    6800 2006 0 0 0  8.589328
    6800 2007 0 0 0  8.743372
    6800 2008 0 0 0   8.77106
    6800 2009 0 0 0  8.776938
    6800 2010 0 0 0 8.8459215
    6800 2011 0 0 0  8.888481
    6800 2012 0 0 0  8.908694
    6800 2013 0 0 0 8.9449415
    6800 2014 0 0 0  8.971956
    6800 2015 0 0 0  9.024974
    6800 2016 0 0 0  9.041449
    6800 2017 0 1 0  9.098739
    6800 2018 0 1 0   9.11603
    7700 2004 1 0 0  8.530702
    7700 2005 1 0 0 8.4593525
    7700 2006 1 0 0  8.497194
    7700 2007 1 0 0  8.620832
    7700 2008 1 0 0  8.650325
    7700 2009 1 0 0  8.647519
    7700 2010 1 0 0  8.683724
    7700 2011 1 0 0  8.750999
    7700 2012 1 0 0  8.791486
    7700 2013 1 0 0  8.829812
    7700 2014 1 0 0  8.853808
    7700 2015 1 0 0  8.915432
    7700 2016 1 0 0  8.907613
    7700 2017 1 1 1  8.967887
    7700 2018 1 1 1  8.994669
    9200 2004 1 0 0  8.431853
    9200 2005 1 0 0  8.405591
    9200 2006 1 0 0  8.416267
    9200 2007 1 0 0  8.531096
    9200 2008 1 0 0  8.570354
    9200 2009 1 0 0  8.569026
    9200 2010 1 0 0  8.614864
    9200 2011 1 0 0   8.68169
    9200 2012 1 0 0 8.7194805
    9200 2013 1 0 0  8.752423
    9200 2014 1 0 0  8.782783
    9200 2015 1 0 0  8.852236
    9200 2016 1 0 0  8.852808
    9200 2017 1 1 1  8.927181
    9200 2018 1 1 1  8.950792
    end
    Following my model specification, I run this fixed-effect regression:
    Code:
    . xtset city_code
    
    Panel variable: city_code (balanced)
    
    . xtreg lwage treat post treatXpost i.year, fe
    note: treat omitted because of collinearity.
    note: 2018.year omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =         90
    Group variable: city_code                       Number of groups  =          6
    
    R-squared:                                      Obs per group:
         Within  = 0.9819                                         min =         15
         Between = 0.4846                                         avg =       15.0
         Overall = 0.3526                                         max =         15
    
                                                    F(15,69)          =     248.97
    corr(u_i, Xb) = -0.0094                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
           lwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           treat |          0  (omitted)
            post |   .4752808    .016468    28.86   0.000     .4424281    .5081335
      treatXpost |   .0325399    .015584     2.09   0.040     .0014508    .0636291
                 |
            year |
           2005  |   .0197757   .0145079     1.36   0.177    -.0091667    .0487181
           2006  |  -.0124221   .0145079    -0.86   0.395    -.0413645    .0165203
           2007  |   .1162613   .0145079     8.01   0.000     .0873189    .1452037
           2008  |   .1517364   .0145079    10.46   0.000      .122794    .1806788
           2009  |    .151556   .0145079    10.45   0.000     .1226136    .1804984
           2010  |   .2084548   .0145079    14.37   0.000     .1795123    .2373972
           2011  |   .2644577   .0145079    18.23   0.000     .2355153    .2934001
           2012  |   .2876525   .0145079    19.83   0.000     .2587101    .3165949
           2013  |   .3195616   .0145079    22.03   0.000     .2906192    .3485041
           2014  |   .3443263   .0145079    23.73   0.000     .3153839    .3732688
           2015  |   .4029185   .0145079    27.77   0.000     .3739761    .4318609
           2016  |   .4081998   .0145079    28.14   0.000     .3792574    .4371422
           2017  |  -.0220569   .0145079    -1.52   0.133    -.0509993    .0068855
           2018  |          0  (omitted)
                 |
           _cons |   8.606132   .0102586   838.92   0.000     8.585667    8.626598
    -------------+----------------------------------------------------------------
         sigma_u |  .23607787
         sigma_e |  .02512838
             rho |  .98879722   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(5, 69) = 1237.14                    Prob > F = 0.0000
    As you may see, in addition to the omission of year==2004 (as a "base value"), Stata also omits year==2018.
    Why does Stata also omits year==2018?

    Many Thanks!
    Last edited by Asaf Yancu; 26 Apr 2022, 06:45.

  • #2
    The variable post is colinear with the year indicators even after ther reference year 2004 is omitted. So either post or one of the year indicators must be omitted to identify the model.

    This is normal and expected, and is of no concern.

    Comment

    Working...
    X