Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ivregress omits variable because of collinearity, but there is no problem with manual 2SLS

    I am using ivregress in Stata/SE 13.1. The following is an example of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(year z w y x x2 x3) float id
    2000 1   .17732775483590524              .625 14.237260019993542  202.6995728769065 2885.8865249901482 1
    2001 1   .11516598636189011              .625 14.037846192677339 197.06112572926565 2766.3137735432824 1
    2006 1    .0997778162536522               .75 14.402869127783022 207.44263911204527 2987.7691826527116 1
    1998 1   .09476815052776504 .1111111111111111 13.166248675216465 173.35010417763934   2282.37057947748 2
    1996 1  .023636476968148103 .6666666666666666 13.744497522319085 188.91121214103546  2596.489687210757 3
    1996 1                    0                .7 13.652930450124554 186.40250987593825 2544.9405030648404 4
    1997 0                    0 .7272727272727273 13.808807391860622 190.68316158550456  2633.107051205269 4
    1998 0                    0 .7272727272727273 14.192381771074801 201.42370033593633  2858.682052910176 4
    1999 0                    0 .4444444444444444 14.288519627284563 204.16179313929618  2917.169788412444 4
    2000 0 .0006008371743015709 .4444444444444444 14.450500149204077 208.81695456214703 3017.5094330566467 4
    2001 0 .0011607436270597625 .4444444444444444 14.268634480356768 203.59392993402605  2905.007368647984 4
    1999 1  .001261335087929178 .5769230769230769 14.088224683134758 198.47807472248746  2796.203711366413 5
    2003 1  .017300749747983873 .8888888888888888  13.91097647734573 193.51526655326623 2691.9863210297754 6
    2004 0   .03491016654472792 .8888888888888888  14.40845514787129  207.6035797482187 2991.2468673397298 6
    1997 0 .0032880515165995402 .6666666666666666 14.067155848171003 197.88487365673166 2783.6773577248728 7
    1998 0 .0071281133892940685 .5555555555555556 14.495947618579011  210.1324973605865 3046.0696747002544 7
    1999 0  .004993944753080487 .5555555555555556 14.448307218786814 208.75358148844717 3016.1358783671326 7
    2000 0   .03340747095354432                .7 14.503336166543715 210.34675995977494  3050.729771239893 7
    2001 0   .08941331514727026 .6666666666666666 14.295431475834716 204.35936108028594 2921.4052427685915 7
    2003 0   .17443912614864626               .75  14.15916631366352 200.48199069798383  2838.657849187096 7
    end
    When I run ivregress 2sls y (w=z) x x2 x3, r, x3 is omitted because of collinearity:

    Code:
    note: x3 omitted because of collinearity
    
    Instrumental variables (2SLS) regression               Number of obs =      20
                                                           Wald chi2(3)  =   29.47
                                                           Prob > chi2   =  0.0000
                                                           R-squared     =  0.4243
                                                           Root MSE      =   .1294
    
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               w |   .6922701   1.273419     0.54   0.587    -1.803584    3.188125
               x |   19.93254   4.850439     4.11   0.000     10.42586    29.43923
              x2 |  -.7096673   .1747043    -4.06   0.000    -1.052081   -.3672532
              x3 |          0  (omitted)
           _cons |  -139.2837   33.66796    -4.14   0.000    -205.2717   -73.29572
    ------------------------------------------------------------------------------
    Instrumented:  w
    Instruments:   x x2 z
    However, if I manually run the first stage, x3 is not omitted:

    Code:
    reg w z x x2 x3, r
    
    Linear regression                                      Number of obs =      20
                                                           F(  3,    15) =       .
                                                           Prob > F      =       .
                                                           R-squared     =  0.3350
                                                           Root MSE      =   .0544
    
    ------------------------------------------------------------------------------
                 |               Robust
               w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               z |   .0537033   .0341344     1.57   0.137    -.0190525    .1264591
               x |  -318.4292   128.1503    -2.48   0.025     -591.575   -45.28335
              x2 |   22.93841   9.293324     2.47   0.026     3.130155    42.74666
              x3 |  -.5504091   .2244903    -2.45   0.027    -1.028899   -.0719193
           _cons |   1472.422   588.6305     2.50   0.024     217.7853    2727.058
    ------------------------------------------------------------------------------
    Similarly, if I estimate the reduced form or do a "manual 2SLS", x3 is not omitted in either case:

    Code:
    reg y z x x2 x3, r // Reduced form
    
    Linear regression                                      Number of obs =      20
                                                           F(  2,    15) =       .
                                                           Prob > F      =       .
                                                           R-squared     =  0.5440
                                                           Root MSE      =  .13298
    
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               z |   .0270256   .0588615     0.46   0.653    -.0984348    .1524859
               x |   646.7643   232.9068     2.78   0.014     150.3351    1143.193
              x2 |  -45.96497   16.86735    -2.73   0.016    -81.91686   -10.01307
              x3 |   1.088447   .4069485     2.67   0.017      .221057    1.955838
           _cons |  -3031.587   1071.365    -2.83   0.013    -5315.148   -748.0267
    ------------------------------------------------------------------------------
    Code:
    quietly reg w z x x2 x3, r
    
    predict what, xb // Fitted values from first stage
    
    reg y what x x2 x3, r // Standard errors are incorrect, but the point is that x3 is not omitted
    
    Linear regression                                      Number of obs =      20
                                                           F(  3,    15) =       .
                                                           Prob > F      =       .
                                                           R-squared     =  0.5440
                                                           Root MSE      =  .13298
    
    ------------------------------------------------------------------------------
                 |               Robust
               y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            what |   .5032384   1.096073     0.46   0.653    -1.832987    2.839463
               x |     807.01   467.8038     1.73   0.105    -190.0902     1804.11
              x2 |  -57.50845    33.7402    -1.70   0.109     -129.424    14.40708
              x3 |   1.365434   .8106689     1.68   0.113    -.3624655    3.093334
           _cons |  -3772.566   2160.733    -1.75   0.101     -8378.06    832.9273
    ------------------------------------------------------------------------------
    So I can't see where the collinearity problem is since x3 is not omitted when I manually estimate the first and second stages.

  • #2
    Dear Alistair,

    I believe you have a case of almost perfect collinearity that -ivregress- is treating as perfect collinearity.

    Best wishes,

    Joao

    Comment


    • #3
      The only thing that makes me think this might not be true is that that standard error on x3 in the ivregress command seems reasonable. If it were something like almost perfect collinearity, I very much doubt you would have such precision. I would suggest double checking that every observation in each method is included.
      Originally posted by Joao Santos Silva View Post
      Dear Alistair,

      I believe you have a case of almost perfect collinearity that -ivregress- is treating as perfect collinearity.

      Best wishes,

      Joao

      Comment


      • #4
        Hi Alistair,

        I'm also not sure what is going on, but the easisest way to check is to look at a correlation table and find out for yourself. Note that even if the correlations are high, but not high enough for the matrix to have less than full rank, the estimates will become ineffecient.

        Comment


        • #5
          Dear Joshua D Merfeld et al:

          Maybe this will convince you:

          Code:
          . reg x3 x x2 z
          
                Source |       SS           df       MS      Number of obs   =        20
          -------------+----------------------------------   F(3, 16)        >  99999.00
                 Model |  742313.132         3  247437.711   Prob > F        =    0.0000
              Residual |  .045752377        16  .002859524   R-squared       =    1.0000
          -------------+----------------------------------   Adj R-squared   =    1.0000
                 Total |  742313.178        19  39069.1146   Root MSE        =    .05347
          
          ------------------------------------------------------------------------------
                    x3 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                     x |   -576.579   2.312783  -249.30   0.000    -581.4818   -571.6761
                    x2 |   41.60309   .0831375   500.41   0.000     41.42685    41.77933
                     z |   .0069083   .0297036     0.23   0.819    -.0560604    .0698771
                 _cons |   2661.907   16.07616   165.58   0.000     2627.827    2695.987
          ------------------------------------------------------------------------------
          Best wishes,

          Joao

          Comment


          • #6
            More generally, the problem is that the data are numerically very unbalanced (the variables have very different magnitudes) and that creates numerical problems for Stata. My impression is that Stata is not very good at this; see the following results using good old TSP:

            Code:
            --------------- TSP at 12:18:37 on 05-Aug-2017 ---------------
            
                                 -------------------------------------
                                 |        this copy licensed         |
                                 |            for use by:            |
                                 | TSP 5.1/OxMetrics 11/09#51AGT1109 |
                                 -------------------------------------
                                            TSP Version 5.1
                                      11/16/09 TSP/OxMetrics  4MB
                                 Copyright (c) 2009 TSP International
                                          ALL RIGHTS RESERVED
                                           08/05/17 12:18PM
                           In case of questions or problems, see your local TSP
                           consultant or send a description of the problem and the
                           associated TSP output to:
                                           TSP International
                                            P.O. Box 61015
                                          Palo Alto, CA 94306
                                                  USA
                     PROGRAM
            COMMAND  ***************************************************************
            1  read (file='data.dta');
            2  2sls(inst=(c z x x2 x3)) y c w x x2 x3;
            3
                     EXECUTION
            *******************************************************************************
            
            Current sample:  1 to 20
            
            
                                                 Equation   1
                                                 ============
            
                                  Method of estimation = Instrumental Variable
            
            Dependent variable: Y
            Endogenous variables: W
            Included exogenous variables: C X X2 X3
            Excluded exogenous variables: Z
            Current sample:  1 to 20
            Number of observations:  20
            
                   Mean of dep. var. = .625740           R-squared = .588003
              Std. dev. of dep. var. = .174978  Adjusted R-squared = .478137
            Sum of squared residuals = .239825       Durbin-Watson = 1.28336 [<.216]
               Variance of residuals = .015988     F (zero slopes) = 4.94878 [.010]
            Std. error of regression = .126445              E'PZ*E = 0.
            
                       Estimated    Standard
            Variable  Coefficient     Error       t-statistic   P-value
            C         -3772.57      2414.42       -1.56251      [.118]
            W         .503238       1.31007       .384131       [.701]
            X         807.010       522.637       1.54411       [.123]
            X2        -57.5085      37.6899       -1.52583      [.127]
            X3        1.36543       .905491       1.50795       [.132]
            
            *******************************************************************************
            
            END OF OUTPUT.
            
              MEMORY USAGE:    ITEM:    DATA ARRAY  TOTAL MEMORY
                              UNITS:  (4-BYTE WORDS) (MEGABYTES)
              MEMORY ALLOCATED         :    500000       4.0
              MEMORY ACTUALLY REQUIRED :      1225       2.1
              CURRENT VARIABLE STORAGE :       694
            All the best,

            Joao

            Comment


            • #7
              Yes, you've absolutely convinced me, Joao! I didn't think that through completely, but I guess there is very little variation in z after controlling for the x's but plenty of variation in the x's after controlling for the z and other x's?

              Originally posted by Joao Santos Silva View Post
              Dear Joshua D Merfeld et al:

              Maybe this will convince you:

              Code:
              . reg x3 x x2 z
              
              Source | SS df MS Number of obs = 20
              -------------+---------------------------------- F(3, 16) > 99999.00
              Model | 742313.132 3 247437.711 Prob > F = 0.0000
              Residual | .045752377 16 .002859524 R-squared = 1.0000
              -------------+---------------------------------- Adj R-squared = 1.0000
              Total | 742313.178 19 39069.1146 Root MSE = .05347
              
              ------------------------------------------------------------------------------
              x3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              x | -576.579 2.312783 -249.30 0.000 -581.4818 -571.6761
              x2 | 41.60309 .0831375 500.41 0.000 41.42685 41.77933
              z | .0069083 .0297036 0.23 0.819 -.0560604 .0698771
              _cons | 2661.907 16.07616 165.58 0.000 2627.827 2695.987
              ------------------------------------------------------------------------------
              Best wishes,

              Joao

              Comment


              • #8
                Thanks to Joao Santos Silva, Joshua D Merfeld, Tim Umbach for your helpful replies. I may simply have to use another software package.

                Comment


                • #9
                  Alistair Young:

                  Maybe you should contact Stata's Technical Support? Alternatively, you can try to rescale or recenter some of your variables.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    what is the number of observations in ivregress? I have a dataset collapsed by region and decades (10 years lags)

                    Comment

                    Working...
                    X