omitted because of collinearity

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

#16

15 Dec 2021, 07:01

Shelly:
please note that Carlo is enough for me. Thanks.
That said:
1) have you already checked the collinearity of your categorical variables via -estat vce,corr- after -xtreg,re-?
2) you can check the functional form mispecification of your regression (that, under more general conditions, can be read as a test of model msspecification at large) following an approach similar to the one detailed in -linktest- entry, Stata .pdf manual:

Code:

use "https://www.stata-press.com/data/r16/nlswork.dta"
. xtreg ln_wage c.age##c.age, re

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1015                                         avg =        6.1
     overall = 0.0870                                         max =         15

                                                Wald chi2(2)      =    3388.51
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0590339   .0027172    21.73   0.000     .0537083    .0643596
             |
 c.age#c.age |  -.0006758   .0000451   -15.00   0.000    -.0007641   -.0005876
             |
       _cons |   .5479714   .0397476    13.79   0.000     .4700675    .6258752
-------------+----------------------------------------------------------------
     sigma_u |   .3654049
     sigma_e |  .30245467
         rho |  .59342665   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. gen sq_fitted=fitted^2
(24 missing values generated)

*Augmented regression*

. xtreg ln_wage c.age##c.age fitted sq_fitted , re
note: c.age#c.age omitted because of collinearity

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1105                                         min =          1
     between = 0.1039                                         avg =        6.1
     overall = 0.0888                                         max =         15

                                                Wald chi2(3)      =    3459.51
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0166047   .0024441     6.79   0.000     .0118144     .021395
             |
 c.age#c.age |          0  (omitted)
             |
      fitted |   6.745315   .7234634     9.32   0.000     5.327352    8.163277
   sq_fitted |  -2.009945   .2520254    -7.98   0.000    -2.503906   -1.515985
       _cons |  -4.445486   .5624869    -7.90   0.000     -5.54794   -3.343032
-------------+----------------------------------------------------------------
     sigma_u |  .36492262
     sigma_e |  .30215307
         rho |  .59327076   (fraction of variance due to u_i)
------------------------------------------------------------------------------

*Ancillary regression*

. xtreg ln_wage fitted sq_fitted , re

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1088                                         min =          1
     between = 0.1045                                         avg =        6.1
     overall = 0.0887                                         max =         15

                                                Wald chi2(2)      =    3407.81
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   2.805959   .4327827     6.48   0.000      1.95772    3.654197
   sq_fitted |  -.5516341   .1320951    -4.18   0.000    -.8105358   -.2927324
       _cons |  -1.468083   .3527217    -4.16   0.000    -2.159405   -.7767613
-------------+----------------------------------------------------------------
     sigma_u |  .36481589
     sigma_e |  .30242516
         rho |  .59269507   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

As sq_fitted coefficient reaches statistical significance no matter the approach, the model is misspecified (and deliberately so).

Kind regards,
Carlo
(Stata 19.0)

Comment

Shelly Gupta

Join Date: Sep 2021

Posts: 15
#17

15 Dec 2021, 16:38

Carlo Lazzaro looks like there is misspecification in the model. I tried running the commands as suggested above and sq_fitted in my model came out to be significant. Sharing the final result here
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#18

16 Dec 2021, 00:08

Shelly:
usually, the result you reported is due to missing predictors and/or missing interactions among predictors that should give a fair and true view of the data generating process you're investigating.
In addition, have you already checked the collinearity of your categorical variables via -estat vce,corr- after -xtreg,re-, as it could hide a concomitant cause of the same issue.

Kind regards,
Carlo
(Stata 19.0)
Comment
Shelly Gupta

Join Date: Sep 2021

Posts: 15
#19

16 Dec 2021, 23:03

Carlo Lazzaro Yes I checked that already
Comment
Shelly Gupta

Join Date: Sep 2021

Posts: 15
#20

17 Dec 2021, 01:58

Should I try using PPML method? Because there will be endogeneity issues in this model
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#21

17 Dec 2021, 03:28

Shelly:
some correlations among -intra- and -extra- prefixed predictors look high.
I would investigate them a bit further to decide whether all of them should be plugged in the right-hand side of your regression equation.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment