ivregress omits variable because of collinearity, but there is no problem with manual 2SLS

Alistair Young

Join Date: Aug 2017
Posts: 2

ivregress omits variable because of collinearity, but there is no problem with manual 2SLS

04 Aug 2017, 16:17

I am using ivregress in Stata/SE 13.1. The following is an example of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(year z w y x x2 x3) float id
2000 1   .17732775483590524              .625 14.237260019993542  202.6995728769065 2885.8865249901482 1
2001 1   .11516598636189011              .625 14.037846192677339 197.06112572926565 2766.3137735432824 1
2006 1    .0997778162536522               .75 14.402869127783022 207.44263911204527 2987.7691826527116 1
1998 1   .09476815052776504 .1111111111111111 13.166248675216465 173.35010417763934   2282.37057947748 2
1996 1  .023636476968148103 .6666666666666666 13.744497522319085 188.91121214103546  2596.489687210757 3
1996 1                    0                .7 13.652930450124554 186.40250987593825 2544.9405030648404 4
1997 0                    0 .7272727272727273 13.808807391860622 190.68316158550456  2633.107051205269 4
1998 0                    0 .7272727272727273 14.192381771074801 201.42370033593633  2858.682052910176 4
1999 0                    0 .4444444444444444 14.288519627284563 204.16179313929618  2917.169788412444 4
2000 0 .0006008371743015709 .4444444444444444 14.450500149204077 208.81695456214703 3017.5094330566467 4
2001 0 .0011607436270597625 .4444444444444444 14.268634480356768 203.59392993402605  2905.007368647984 4
1999 1  .001261335087929178 .5769230769230769 14.088224683134758 198.47807472248746  2796.203711366413 5
2003 1  .017300749747983873 .8888888888888888  13.91097647734573 193.51526655326623 2691.9863210297754 6
2004 0   .03491016654472792 .8888888888888888  14.40845514787129  207.6035797482187 2991.2468673397298 6
1997 0 .0032880515165995402 .6666666666666666 14.067155848171003 197.88487365673166 2783.6773577248728 7
1998 0 .0071281133892940685 .5555555555555556 14.495947618579011  210.1324973605865 3046.0696747002544 7
1999 0  .004993944753080487 .5555555555555556 14.448307218786814 208.75358148844717 3016.1358783671326 7
2000 0   .03340747095354432                .7 14.503336166543715 210.34675995977494  3050.729771239893 7
2001 0   .08941331514727026 .6666666666666666 14.295431475834716 204.35936108028594 2921.4052427685915 7
2003 0   .17443912614864626               .75  14.15916631366352 200.48199069798383  2838.657849187096 7
end

When I run ivregress 2sls y (w=z) x x2 x3, r, x3 is omitted because of collinearity:

Code:

note: x3 omitted because of collinearity

Instrumental variables (2SLS) regression               Number of obs =      20
                                                       Wald chi2(3)  =   29.47
                                                       Prob > chi2   =  0.0000
                                                       R-squared     =  0.4243
                                                       Root MSE      =   .1294

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           w |   .6922701   1.273419     0.54   0.587    -1.803584    3.188125
           x |   19.93254   4.850439     4.11   0.000     10.42586    29.43923
          x2 |  -.7096673   .1747043    -4.06   0.000    -1.052081   -.3672532
          x3 |          0  (omitted)
       _cons |  -139.2837   33.66796    -4.14   0.000    -205.2717   -73.29572
------------------------------------------------------------------------------
Instrumented:  w
Instruments:   x x2 z

However, if I manually run the first stage, x3 is not omitted:

Code:

reg w z x x2 x3, r

Linear regression                                      Number of obs =      20
                                                       F(  3,    15) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.3350
                                                       Root MSE      =   .0544

------------------------------------------------------------------------------
             |               Robust
           w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           z |   .0537033   .0341344     1.57   0.137    -.0190525    .1264591
           x |  -318.4292   128.1503    -2.48   0.025     -591.575   -45.28335
          x2 |   22.93841   9.293324     2.47   0.026     3.130155    42.74666
          x3 |  -.5504091   .2244903    -2.45   0.027    -1.028899   -.0719193
       _cons |   1472.422   588.6305     2.50   0.024     217.7853    2727.058
------------------------------------------------------------------------------

Similarly, if I estimate the reduced form or do a "manual 2SLS", x3 is not omitted in either case:

Code:

reg y z x x2 x3, r // Reduced form

Linear regression                                      Number of obs =      20
                                                       F(  2,    15) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.5440
                                                       Root MSE      =  .13298

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           z |   .0270256   .0588615     0.46   0.653    -.0984348    .1524859
           x |   646.7643   232.9068     2.78   0.014     150.3351    1143.193
          x2 |  -45.96497   16.86735    -2.73   0.016    -81.91686   -10.01307
          x3 |   1.088447   .4069485     2.67   0.017      .221057    1.955838
       _cons |  -3031.587   1071.365    -2.83   0.013    -5315.148   -748.0267
------------------------------------------------------------------------------

Code:

quietly reg w z x x2 x3, r

predict what, xb // Fitted values from first stage

reg y what x x2 x3, r // Standard errors are incorrect, but the point is that x3 is not omitted

Linear regression                                      Number of obs =      20
                                                       F(  3,    15) =       .
                                                       Prob > F      =       .
                                                       R-squared     =  0.5440
                                                       Root MSE      =  .13298

------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        what |   .5032384   1.096073     0.46   0.653    -1.832987    2.839463
           x |     807.01   467.8038     1.73   0.105    -190.0902     1804.11
          x2 |  -57.50845    33.7402    -1.70   0.109     -129.424    14.40708
          x3 |   1.365434   .8106689     1.68   0.113    -.3624655    3.093334
       _cons |  -3772.566   2160.733    -1.75   0.101     -8378.06    832.9273
------------------------------------------------------------------------------

So I can't see where the collinearity problem is since x3 is not omitted when I manually estimate the first and second stages.

Tags: collinearity, instrumental variables, ivregress

Joao Santos Silva

Join Date: Apr 2014

Posts: 2961
#2

04 Aug 2017, 18:53

Dear Alistair,

I believe you have a case of almost perfect collinearity that -ivregress- is treating as perfect collinearity.

Best wishes,

Joao
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#3

04 Aug 2017, 23:30

The only thing that makes me think this might not be true is that that standard error on x3 in the ivregress command seems reasonable. If it were something like almost perfect collinearity, I very much doubt you would have such precision. I would suggest double checking that every observation in each method is included.

Originally posted by Joao Santos Silva View Post

Dear Alistair,

I believe you have a case of almost perfect collinearity that -ivregress- is treating as perfect collinearity.

Best wishes,

Joao
Comment
Tim Umbach

Join Date: Jun 2017

Posts: 47
#4

05 Aug 2017, 02:13

Hi Alistair,

I'm also not sure what is going on, but the easisest way to check is to look at a correlation table and find out for yourself. Note that even if the correlations are high, but not high enough for the matrix to have less than full rank, the estimates will become ineffecient.
Comment

Joao Santos Silva

Join Date: Apr 2014
Posts: 2961

05 Aug 2017, 04:19

Dear Joshua D Merfeld et al:

Maybe this will convince you:

Code:

. reg x3 x x2 z

      Source |       SS           df       MS      Number of obs   =        20
-------------+----------------------------------   F(3, 16)        >  99999.00
       Model |  742313.132         3  247437.711   Prob > F        =    0.0000
    Residual |  .045752377        16  .002859524   R-squared       =    1.0000
-------------+----------------------------------   Adj R-squared   =    1.0000
       Total |  742313.178        19  39069.1146   Root MSE        =    .05347

------------------------------------------------------------------------------
          x3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   -576.579   2.312783  -249.30   0.000    -581.4818   -571.6761
          x2 |   41.60309   .0831375   500.41   0.000     41.42685    41.77933
           z |   .0069083   .0297036     0.23   0.819    -.0560604    .0698771
       _cons |   2661.907   16.07616   165.58   0.000     2627.827    2695.987
------------------------------------------------------------------------------

Best wishes,

Joao

Comment

Joao Santos Silva

Join Date: Apr 2014
Posts: 2961

05 Aug 2017, 06:22

More generally, the problem is that the data are numerically very unbalanced (the variables have very different magnitudes) and that creates numerical problems for Stata. My impression is that Stata is not very good at this; see the following results using good old TSP:

Code:

--------------- TSP at 12:18:37 on 05-Aug-2017 ---------------

                     -------------------------------------
                     |        this copy licensed         |
                     |            for use by:            |
                     | TSP 5.1/OxMetrics 11/09#51AGT1109 |
                     -------------------------------------
                                TSP Version 5.1
                          11/16/09 TSP/OxMetrics  4MB
                     Copyright (c) 2009 TSP International
                              ALL RIGHTS RESERVED
                               08/05/17 12:18PM
               In case of questions or problems, see your local TSP
               consultant or send a description of the problem and the
               associated TSP output to:
                               TSP International
                                P.O. Box 61015
                              Palo Alto, CA 94306
                                      USA
         PROGRAM
COMMAND  ***************************************************************
1  read (file='data.dta');
2  2sls(inst=(c z x x2 x3)) y c w x x2 x3;
3
         EXECUTION
*******************************************************************************

Current sample:  1 to 20


                                     Equation   1
                                     ============

                      Method of estimation = Instrumental Variable

Dependent variable: Y
Endogenous variables: W
Included exogenous variables: C X X2 X3
Excluded exogenous variables: Z
Current sample:  1 to 20
Number of observations:  20

       Mean of dep. var. = .625740           R-squared = .588003
  Std. dev. of dep. var. = .174978  Adjusted R-squared = .478137
Sum of squared residuals = .239825       Durbin-Watson = 1.28336 [<.216]
   Variance of residuals = .015988     F (zero slopes) = 4.94878 [.010]
Std. error of regression = .126445              E'PZ*E = 0.

           Estimated    Standard
Variable  Coefficient     Error       t-statistic   P-value
C         -3772.57      2414.42       -1.56251      [.118]
W         .503238       1.31007       .384131       [.701]
X         807.010       522.637       1.54411       [.123]
X2        -57.5085      37.6899       -1.52583      [.127]
X3        1.36543       .905491       1.50795       [.132]

*******************************************************************************

END OF OUTPUT.

  MEMORY USAGE:    ITEM:    DATA ARRAY  TOTAL MEMORY
                  UNITS:  (4-BYTE WORDS) (MEGABYTES)
  MEMORY ALLOCATED         :    500000       4.0
  MEMORY ACTUALLY REQUIRED :      1225       2.1
  CURRENT VARIABLE STORAGE :       694

All the best,

Joao

Comment

Joshua D Merfeld

Join Date: Jun 2015
Posts: 86

07 Aug 2017, 20:38

Yes, you've absolutely convinced me, Joao! I didn't think that through completely, but I guess there is very little variation in z after controlling for the x's but plenty of variation in the x's after controlling for the z and other x's?

Originally posted by Joao Santos Silva View Post

Dear Joshua D Merfeld et al:

Maybe this will convince you:

Code:

. reg x3 x x2 z

Source | SS df MS Number of obs = 20
-------------+---------------------------------- F(3, 16) > 99999.00
Model | 742313.132 3 247437.711 Prob > F = 0.0000
Residual | .045752377 16 .002859524 R-squared = 1.0000
-------------+---------------------------------- Adj R-squared = 1.0000
Total | 742313.178 19 39069.1146 Root MSE = .05347

------------------------------------------------------------------------------
x3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | -576.579 2.312783 -249.30 0.000 -581.4818 -571.6761
x2 | 41.60309 .0831375 500.41 0.000 41.42685 41.77933
z | .0069083 .0297036 0.23 0.819 -.0560604 .0698771
_cons | 2661.907 16.07616 165.58 0.000 2627.827 2695.987
------------------------------------------------------------------------------

Best wishes,

Joao

Comment

Alistair Young

Join Date: Aug 2017

Posts: 2
#8

11 Aug 2017, 19:45

Thanks to Joao Santos Silva, Joshua D Merfeld, Tim Umbach for your helpful replies. I may simply have to use another software package.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2961
#9

12 Aug 2017, 03:26

Alistair Young:

Maybe you should contact Stata's Technical Support? Alternatively, you can try to rescale or recenter some of your variables.

Best wishes,

Joao
Comment
Monicucha Paganini

Join Date: Nov 2019

Posts: 24
#10

04 Feb 2022, 09:30

what is the number of observations in ivregress? I have a dataset collapsed by region and decades (10 years lags)
Comment

Announcement

ivregress omits variable because of collinearity, but there is no problem with manual 2SLS

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment