reg y i.fe, noconstant collinearity

Julian Duggan

Join Date: Jul 2016
Posts: 63

reg y i.fe, noconstant collinearity

11 Feb 2022, 10:53

Hi Statalist,

I noticed the following issue using a simple fixed effect regression and am wondering what the rationale is for this decision by STATA.

I have a set of 30 observations of y across 7 values of categorical x. Within each value of x, the means of y are given here:

Code:

tabstat y, by(x) stat(mean)

Summary for variables: y
     by categories of: x 

       x |      mean
---------+----------
       1 |  1521.594
       2 |  2434.029
       3 |  1824.588
       4 |  2239.116
       5 |  2109.643
       6 |   2234.62
       7 |  1997.953
---------+----------
   Total |  2083.711
--------------------

I do the simple fe regression and the coefficients give the deviations of means of y for x= 2-7 from mean of y when x=1, and the mean of y for x=1 in the constant:

Code:

. reg y i.x

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(6, 23)        =      0.49
       Model |  2801202.94         6  466867.157   Prob > F        =    0.8083
    Residual |  21868456.1        23   950802.44   R-squared       =    0.1135
-------------+----------------------------------   Adj R-squared   =   -0.1177
       Total |  24669659.1        29  850677.898   Root MSE        =    975.09

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
          2  |   912.4347   590.4469     1.55   0.136    -308.9978    2133.867
          3  |   302.9935   712.1058     0.43   0.674     -1170.11    1776.097
          4  |   717.5212   570.9548     1.26   0.221    -463.5888    1898.631
          5  |   588.0482   815.8197     0.72   0.478    -1099.603      2275.7
          6  |   713.0255   654.1109     1.09   0.287    -640.1061    2066.157
          7  |   476.3585   712.1058     0.67   0.510    -996.7445    1949.462
             |
       _cons |   1521.594   436.0739     3.49   0.002     619.5066    2423.682
------------------------------------------------------------------------------

It would be neater for my purposes if the means of each bin was the coefficient, so I want to suppress the constant:

Code:

. reg y i.x, noconstant

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(6, 24)        =     14.53
       Model |   121480550         6  20246758.3   Prob > F        =    0.0000
    Residual |  33444702.7        24  1393529.28   R-squared       =    0.7841
-------------+----------------------------------   Adj R-squared   =    0.7302
       Total |   154925252        30  5164175.08   Root MSE        =    1180.5

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
          2  |   2434.029   481.9283     5.05   0.000     1439.378     3428.68
          3  |   1824.588   681.5495     2.68   0.013     417.9387    3231.237
          4  |   2239.116   446.1789     5.02   0.000     1318.248    3159.984
          5  |   2109.643   834.7243     2.53   0.018     386.8563    3832.429
          6  |    2234.62   590.2392     3.79   0.001     1016.426    3452.814
          7  |   1997.953   681.5495     2.93   0.007     591.3037    3404.602
------------------------------------------------------------------------------

I now have the means of y for bins of x=2-7, but the fixed effect for x=1 is still suppressed. My question: Why is STATA suppressing this fixed effect when the constant term has been dropped? The final fixed effect would no longer cause collinearity issues without the constant term, so I don't see why it should be dropped. For comparison, I can do the regression manually like this, including all the fixed effects, and it works fine.

Code:

. tab x, gen(x)

          x |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          5       14.29       14.29
          2 |          6       17.14       31.43
          3 |          3        8.57       40.00
          4 |          9       25.71       65.71
          5 |          2        5.71       71.43
          6 |          6       17.14       88.57
          7 |          4       11.43      100.00
------------+-----------------------------------
      Total |         35      100.00

. reg y x?, noconstant

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(7, 23)        =     19.99
       Model |   133056796         7  19008113.8   Prob > F        =    0.0000
    Residual |  21868456.1        23   950802.44   R-squared       =    0.8588
-------------+----------------------------------   Adj R-squared   =    0.8159
       Total |   154925252        30  5164175.08   Root MSE        =    975.09

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1521.594   436.0739     3.49   0.002     619.5066    2423.682
          x2 |   2434.029   398.0792     6.11   0.000     1610.539    3257.519
          x3 |   1824.588    562.969     3.24   0.004     659.9976    2989.178
          x4 |   2239.116   368.5498     6.08   0.000     1476.712    3001.519
          x5 |   2109.643   689.4935     3.06   0.006     683.3166    3535.968
          x6 |    2234.62   487.5455     4.58   0.000     1226.055    3243.185
          x7 |   1997.953    562.969     3.55   0.002     833.3626    3162.543
------------------------------------------------------------------------------

If it is of interest, the reason I want the coefficients organized this way is in order to take residuals that are equal to deviations from within-x means as follows:

Code:

predict yres, residuals

Thanks,
Julian

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#2

11 Feb 2022, 11:06

Code:

reg y ibn.x, noconstant

See

Code:

help fvvarlist
1 like
Comment
Julian Duggan

Join Date: Jul 2016

Posts: 63
#3

11 Feb 2022, 12:28

Originally posted by Andrew Musau View Post

Code:

reg y ibn.x, noconstant

See

Code:

help fvvarlist

Brilliant, thanks. Will remember going forward.
Comment

Announcement

reg y i.fe, noconstant collinearity

Comment

Comment