Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reg y i.fe, noconstant collinearity

    Hi Statalist,

    I noticed the following issue using a simple fixed effect regression and am wondering what the rationale is for this decision by STATA.

    I have a set of 30 observations of y across 7 values of categorical x. Within each value of x, the means of y are given here:

    Code:
    tabstat y, by(x) stat(mean)
    
    Summary for variables: y
         by categories of: x 
    
           x |      mean
    ---------+----------
           1 |  1521.594
           2 |  2434.029
           3 |  1824.588
           4 |  2239.116
           5 |  2109.643
           6 |   2234.62
           7 |  1997.953
    ---------+----------
       Total |  2083.711
    --------------------
    I do the simple fe regression and the coefficients give the deviations of means of y for x= 2-7 from mean of y when x=1, and the mean of y for x=1 in the constant:

    Code:
    . reg y i.x
    
          Source |       SS           df       MS      Number of obs   =        30
    -------------+----------------------------------   F(6, 23)        =      0.49
           Model |  2801202.94         6  466867.157   Prob > F        =    0.8083
        Residual |  21868456.1        23   950802.44   R-squared       =    0.1135
    -------------+----------------------------------   Adj R-squared   =   -0.1177
           Total |  24669659.1        29  850677.898   Root MSE        =    975.09
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |
              2  |   912.4347   590.4469     1.55   0.136    -308.9978    2133.867
              3  |   302.9935   712.1058     0.43   0.674     -1170.11    1776.097
              4  |   717.5212   570.9548     1.26   0.221    -463.5888    1898.631
              5  |   588.0482   815.8197     0.72   0.478    -1099.603      2275.7
              6  |   713.0255   654.1109     1.09   0.287    -640.1061    2066.157
              7  |   476.3585   712.1058     0.67   0.510    -996.7445    1949.462
                 |
           _cons |   1521.594   436.0739     3.49   0.002     619.5066    2423.682
    ------------------------------------------------------------------------------
    It would be neater for my purposes if the means of each bin was the coefficient, so I want to suppress the constant:

    Code:
    . reg y i.x, noconstant
    
          Source |       SS           df       MS      Number of obs   =        30
    -------------+----------------------------------   F(6, 24)        =     14.53
           Model |   121480550         6  20246758.3   Prob > F        =    0.0000
        Residual |  33444702.7        24  1393529.28   R-squared       =    0.7841
    -------------+----------------------------------   Adj R-squared   =    0.7302
           Total |   154925252        30  5164175.08   Root MSE        =    1180.5
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               x |
              2  |   2434.029   481.9283     5.05   0.000     1439.378     3428.68
              3  |   1824.588   681.5495     2.68   0.013     417.9387    3231.237
              4  |   2239.116   446.1789     5.02   0.000     1318.248    3159.984
              5  |   2109.643   834.7243     2.53   0.018     386.8563    3832.429
              6  |    2234.62   590.2392     3.79   0.001     1016.426    3452.814
              7  |   1997.953   681.5495     2.93   0.007     591.3037    3404.602
    ------------------------------------------------------------------------------
    I now have the means of y for bins of x=2-7, but the fixed effect for x=1 is still suppressed. My question: Why is STATA suppressing this fixed effect when the constant term has been dropped? The final fixed effect would no longer cause collinearity issues without the constant term, so I don't see why it should be dropped. For comparison, I can do the regression manually like this, including all the fixed effects, and it works fine.

    Code:
    . tab x, gen(x)
    
              x |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          5       14.29       14.29
              2 |          6       17.14       31.43
              3 |          3        8.57       40.00
              4 |          9       25.71       65.71
              5 |          2        5.71       71.43
              6 |          6       17.14       88.57
              7 |          4       11.43      100.00
    ------------+-----------------------------------
          Total |         35      100.00
    
    . reg y x?, noconstant
    
          Source |       SS           df       MS      Number of obs   =        30
    -------------+----------------------------------   F(7, 23)        =     19.99
           Model |   133056796         7  19008113.8   Prob > F        =    0.0000
        Residual |  21868456.1        23   950802.44   R-squared       =    0.8588
    -------------+----------------------------------   Adj R-squared   =    0.8159
           Total |   154925252        30  5164175.08   Root MSE        =    975.09
    
    ------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
              x1 |   1521.594   436.0739     3.49   0.002     619.5066    2423.682
              x2 |   2434.029   398.0792     6.11   0.000     1610.539    3257.519
              x3 |   1824.588    562.969     3.24   0.004     659.9976    2989.178
              x4 |   2239.116   368.5498     6.08   0.000     1476.712    3001.519
              x5 |   2109.643   689.4935     3.06   0.006     683.3166    3535.968
              x6 |    2234.62   487.5455     4.58   0.000     1226.055    3243.185
              x7 |   1997.953    562.969     3.55   0.002     833.3626    3162.543
    ------------------------------------------------------------------------------
    If it is of interest, the reason I want the coefficients organized this way is in order to take residuals that are equal to deviations from within-x means as follows:
    Code:
    predict yres, residuals
    Thanks,
    Julian

  • #2
    Code:
    reg y ibn.x, noconstant
    See

    Code:
    help fvvarlist

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Code:
      reg y ibn.x, noconstant
      See

      Code:
      help fvvarlist
      Brilliant, thanks. Will remember going forward.

      Comment

      Working...
      X