Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perfect multicollinearity within indicator variables

    Hello everyone,
    I do a simple cross section on wage determinants among individuals. Part of my data includes dummies for regions of work (1-9) and part_time occupation (1 for part-time, 0 full). Problem occurs when trying to use them all at once, even if leaving one region out or using nocon specification (as region 1-6 = part-time only and reg 7-9 = full-time only). How could one construct a regression (if possible) that still allows to estimate difference between every region and additionally part_time influence on wages.

  • #2
    Paul:
    first of all, take a look at -fvvarlist- notation.
    Second, please post what you typed and what Stata gave you back within CODE delimiters (as per FAQ). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hope this works

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id nearc4 educ age reg661 reg662 reg663 reg664 reg665 reg666 reg667 reg668 reg669 part_time hisp union wage IQ married female services region pwe pwe2 lw female_educ female_hisp)
      4171 0  9 32 0 0 0 0 0 0 0 0 1 1 0 0  635  74 1 0 0 9 -29 841 6.453625  0 0
      2828 1 18 32 0 0 0 0 1 0 0 0 0 0 0 0  535 129 1 0 1 5 -20 400 6.282267  0 0
      1430 1 16 26 0 0 1 0 0 0 0 0 0 1 0 1  300 106 0 0 0 3 -16 256 5.703783  0 0
      1711 1 13 24 0 0 1 0 0 0 0 0 0 1 0 1  769 102 1 1 0 3 -17 289 6.645091 13 0
      3993 1 15 26 0 0 0 0 0 0 0 0 1 1 0 1  263 101 0 0 0 9 -17 289 5.572154  0 0
      1912 1 12 28 0 0 1 0 0 0 0 0 0 1 0 1  500  79 1 0 0 3 -22 484 6.214608  0 0
      1559 1 12 34 0 0 1 0 0 0 0 0 0 1 0 1  962   . 1 0 0 3 -28 784 6.869014  0 0
       917 0 17 31 0 1 0 0 0 0 0 0 0 1 0 1  625  96 1 1 0 2 -20 400 6.437752 17 0
      3671 1 13 25 0 0 0 0 0 0 0 0 1 1 0 1  845 115 1 1 0 9 -18 324 6.739336 13 0
      2116 1 12 28 0 0 1 0 0 0 0 0 0 1 0 1 1251  94 1 0 0 3 -22 484 7.131699  0 0
       505 0 17 24 0 1 0 0 0 0 0 0 0 1 0 1  525   . 1 0 0 2 -13 169 6.263398  0 0
      3729 1 18 27 0 0 0 0 0 0 0 0 1 1 0 1  643 118 0 0 0 9 -15 225 6.466145  0 0
      2610 1 12 30 0 0 0 0 0 0 1 0 0 0 0 1  581   . 1 0 1 7 -24 576 6.364751  0 0
      3079 0  8 26 0 0 0 0 1 0 0 0 0 0 1 1  285   . 1 0 1 5 -24 576 5.652489  0 0
      4257 1 14 25 0 0 0 0 1 0 0 0 0 0 0 1  529 118 1 0 1 5 -17 289 6.270988  0 0
      2091 1 11 32 0 0 1 0 0 0 0 0 0 1 0 1  726  93 1 0 0 3 -27 729  6.58755  0 0
      1076 1 16 32 0 0 1 0 0 0 0 0 0 1 1 1  797   . 0 0 0 3 -22 484 6.680855  0 0
        30 1 15 32 1 0 0 0 0 0 0 0 0 1 0 0  726 111 1 0 0 1 -23 529  6.58755  0 0
      1395 1 16 34 0 0 1 0 0 0 0 0 0 1 0 1 1442   . 1 0 0 3 -24 576 7.273787  0 0
       487 1 16 24 0 1 0 0 0 0 0 0 0 1 0 1  750 112 0 1 0 2 -14 196 6.620073 16 0
      2315 0 12 24 0 0 1 0 0 0 0 0 0 1 0 1  160  83 0 0 0 3 -18 324 5.075174  0 0
      2424 1 12 33 0 0 0 0 0 0 1 0 0 0 0 1 1100  89 1 0 1 7 -27 729 7.003066  0 0
      4377 1 12 32 0 0 0 0 1 0 0 0 0 0 0 1  596  98 0 0 1 5 -26 676 6.390241  0 0
       283 1 16 24 0 1 0 0 0 0 0 0 0 1 0 0  385 118 0 1 0 2 -14 196 5.953243 16 0
      2925 1 13 32 0 0 0 0 1 0 0 0 0 0 0 1  962  94 1 0 1 5 -25 625 6.869014  0 0
      2131 1 17 28 0 0 0 1 0 0 0 0 0 1 0 0  423   . 1 0 0 4 -17 289 6.047372  0 0
       561 1 13 32 0 1 0 0 0 0 0 0 0 1 0 1  389   . 1 0 0 2 -25 625 5.963579  0 0
      4879 0 16 32 0 0 0 0 1 0 0 0 0 0 0 0  738 107 1 0 1 5 -22 484 6.603944  0 0
       516 1 12 24 1 0 0 0 0 0 0 0 0 1 0 1  488   . 1 0 0 1 -18 324 6.190315  0 0
      3945 1 12 29 0 0 0 0 0 0 0 0 1 1 0 1 1200 100 1 0 0 9 -23 529 7.090077  0 0
      3267 1 10 27 0 0 0 0 1 0 0 0 0 0 1 0  225   . 1 0 1 5 -23 529 5.416101  0 0
      2764 1 12 24 0 0 0 0 0 0 1 0 0 0 0 1  397   . 1 0 1 7 -18 324 5.983936  0 0
      5123 0 10 25 0 0 0 0 0 0 1 0 0 0 1 0  200  75 1 0 1 7 -21 441 5.298317  0 0
      1155 1 16 25 0 0 1 0 0 0 0 0 0 1 0 1  243 121 0 0 0 3 -15 225 5.493062  0 0
      3305 1 12 28 0 0 0 0 0 0 1 0 0 0 1 1  500  65 1 0 1 7 -22 484 6.214608  0 0
      2143 1 16 27 0 0 0 1 0 0 0 0 0 1 0 0  396 118 1 0 0 4 -17 289 5.981414  0 0
      2621 1 18 29 0 0 0 0 0 0 1 0 0 0 1 1  750  96 1 0 1 7 -17 289 6.620073  0 0
        11 1 12 29 0 1 0 0 0 0 0 0 0 1 0 1  515  97 1 0 0 2 -23 529 6.244167  0 0
      3684 1 16 27 0 0 0 0 0 0 0 0 1 1 0 1  577 117 0 0 0 9 -17 289 6.357842  0 0
      3329 1 12 25 0 0 0 0 1 0 0 0 0 0 0 1  346   . 1 1 1 5 -19 361 5.846439 12 0
      4701 0 12 26 0 0 0 0 1 0 0 0 0 0 1 1  475  79 0 0 0 5 -20 400 6.163315  0 0
      2803 1 16 24 0 0 0 0 0 1 0 0 0 0 0 1  400   . 1 0 1 6 -14 196 5.991465  0 0
       904 0 12 26 0 1 0 0 0 0 0 0 0 1 0 1  514   . 0 0 0 2 -20 400 6.242223  0 0
      4164 0 10 34 0 0 0 0 0 0 0 0 1 1 0 0  575   . 1 0 0 9 -30 900  6.35437  0 0
         5 1 11 27 0 1 0 0 0 0 0 0 0 1 0 1  250  88 1 0 0 2 -22 484 5.521461  0 0
      3206 1 16 27 0 0 0 0 0 0 1 0 0 0 0 1  361 112 0 0 1 7 -17 289 5.888878  0 0
      4974 0 12 25 0 0 0 0 0 1 0 0 0 0 1 0  250   . 1 0 1 6 -19 361 5.521461  0 0
      2146 1 13 31 0 0 0 1 0 0 0 0 0 1 0 0  450   . 1 0 0 4 -24 576 6.109248  0 0
      2602 1 13 24 0 0 0 0 0 0 1 0 0 0 1 1  435   . 0 0 1 7 -17 289 6.075346  0 0
      4659 0  9 30 0 0 0 0 1 0 0 0 0 0 1 1  350   . 0 0 1 5 -27 729 5.857933  0 0
      2098 1 15 28 0 0 1 0 0 0 0 0 0 1 0 1  423 118 0 0 0 3 -19 361 6.047372  0 0
      2176 1 12 28 0 0 1 0 0 0 0 0 0 1 0 1  230  74 0 0 0 3 -22 484 5.438079  0 0
      2470 1 10 30 0 0 0 0 0 1 0 0 0 0 1 1  520   . 1 0 1 6 -26 676 6.253829  0 0
      4342 1 15 24 0 0 0 0 0 0 1 0 0 0 1 1  400   . 0 0 1 7 -15 225 5.991465  0 0
      4470 1 12 28 0 0 0 0 1 0 0 0 0 0 0 0  465 118 1 0 1 5 -22 484 6.142037  0 0
      3040 1  8 29 0 0 0 0 1 0 0 0 0 0 0 0  528   . 1 0 1 5 -27 729 6.269096  0 0
      2737 0 16 25 0 0 0 0 0 1 0 0 0 0 1 1  250  79 1 0 1 6 -15 225 5.521461  0 0
      3646 1 16 29 0 0 0 0 0 0 0 0 1 1 0 1  824 106 0 0 0 9 -19 361  6.71417  0 0
      1655 1 12 28 0 0 1 0 0 0 0 0 0 1 0 0  823  91 1 0 0 3 -22 484 6.712956  0 0
       838 1 14 33 0 1 0 0 0 0 0 0 0 1 0 1 1402   . 1 0 0 2 -25 625 7.245655  0 0
      3346 1 13 34 0 0 0 0 1 0 0 0 0 0 0 1  519  98 1 0 1 5 -27 729 6.251904  0 0
         4 0 12 34 1 0 0 0 0 0 0 0 0 1 0 1  721 103 1 0 0 1 -28 784 6.580639  0 0
      4639 0 12 24 0 0 0 0 1 0 0 0 0 0 1 0  278   . 0 0 1 5 -18 324 5.627621  0 0
      2237 0 11 32 0 0 1 0 0 0 0 0 0 1 0 0  433   . 1 0 0 3 -27 729 6.070738  0 0
      2021 1 12 24 0 0 1 0 0 0 0 0 0 1 0 1  721 102 1 0 0 3 -18 324 6.580639  0 0
      4906 0 12 26 0 0 0 0 1 0 0 0 0 0 0 0  600 102 1 0 1 5 -20 400  6.39693  0 0
       954 0 12 28 0 1 0 0 0 0 0 0 0 1 0 0  863 110 1 0 0 2 -22 484 6.760415  0 0
      2358 1 12 34 0 0 0 1 0 0 0 0 0 1 0 1  400   . 1 0 0 4 -28 784 5.991465  0 0
         7 1 12 26 0 1 0 0 0 0 0 0 0 1 0 1  500  85 1 0 0 2 -20 400 6.214608  0 0
      4698 0 15 24 0 0 0 0 1 0 0 0 0 0 0 0  300  96 0 0 1 5 -15 225 5.703783  0 0
       750 1 15 27 0 1 0 0 0 0 0 0 0 1 1 1  500  97 0 0 0 2 -18 324 6.214608  0 0
      3943 1 18 33 0 0 0 0 0 0 0 0 1 1 0 1  423   . 1 0 0 9 -21 441 6.047372  0 0
      2074 0 18 29 0 0 1 0 0 0 0 0 0 1 0 1  759 111 1 0 0 3 -17 289 6.632002  0 0
      5076 0 17 34 0 0 0 0 1 0 0 0 0 0 0 1  635   . 1 0 1 5 -23 529 6.453625  0 0
       618 1 16 28 0 1 0 0 0 0 0 0 0 1 0 1 1410 102 1 0 0 2 -18 324 7.251345  0 0
      1264 1 17 25 0 0 1 0 0 0 0 0 0 1 0 0  329 117 1 0 0 3 -14 196 5.796058  0 0
      3837 1 15 28 0 0 0 0 0 0 0 0 1 1 0 1  250  87 1 0 0 9 -19 361 5.521461  0 0
       889 0 12 28 0 1 0 0 0 0 0 0 0 1 0 1  275  75 0 0 0 2 -22 484 5.616771  0 0
      3931 1 13 29 0 0 0 0 0 0 0 0 1 1 0 1  675 121 1 0 0 9 -22 484 6.514713  0 0
       366 1 12 27 1 0 0 0 0 0 0 0 0 1 0 1  615 114 1 0 0 1 -21 441 6.421622  0 0
      3658 1 17 31 0 0 0 0 0 0 0 0 1 1 0 0  865 119 0 0 0 9 -20 400  6.76273  0 0
      1310 1 14 33 0 0 1 0 0 0 0 0 0 1 0 1 1055  98 1 0 0 3 -25 625 6.961296  0 0
      1287 1 15 25 0 0 1 0 0 0 0 0 0 1 0 1  750   . 1 0 0 3 -16 256 6.620073  0 0
       324 1 16 25 1 0 0 0 0 0 0 0 0 1 0 1  250 146 0 0 0 1 -15 225 5.521461  0 0
      3127 0 12 24 0 0 0 0 0 0 1 0 0 0 1 0  305   . 1 0 1 7 -18 324 5.720312  0 0
      2821 1 14 24 0 0 0 0 1 0 0 0 0 0 0 1  452   . 1 0 1 5 -16 256 6.113682  0 0
      2406 1 13 26 0 0 0 0 1 0 0 0 0 0 0 0  905 106 1 0 1 5 -19 361 6.807935  0 0
      1081 1 12 28 0 0 1 0 0 0 0 0 0 1 1 0  503 109 1 0 1 3 -22 484  6.22059  0 0
      2216 0 12 30 0 0 1 0 0 0 0 0 0 1 0 1  600  86 1 0 0 3 -24 576  6.39693  0 0
      1033 0 16 33 1 0 0 0 0 0 0 0 0 1 0 0  692 105 1 0 0 1 -23 529 6.539586  0 0
      2399 1 16 26 0 0 0 0 1 0 0 0 0 0 0 0  673 107 0 0 1 5 -16 256 6.511745  0 0
      4591 0 12 32 0 0 0 0 0 1 0 0 0 0 0 1  848 102 1 0 1 6 -26 676 6.742881  0 0
      2783 1 13 29 0 0 0 0 1 0 0 0 0 0 1 1  540   . 0 0 1 5 -22 484 6.291569  0 0
      3012 0 16 25 0 0 0 0 0 1 0 0 0 0 0 0  909 124 0 0 1 6 -15 225 6.812345  0 0
      4637 0 12 24 0 0 0 0 1 0 0 0 0 0 1 0  396   . 0 0 1 5 -18 324 5.981414  0 0
      2169 0 15 30 0 0 0 1 0 0 0 0 0 1 0 1  346 104 0 0 0 4 -21 441 5.846439  0 0
      1791 1 16 28 0 0 0 1 0 0 0 0 0 1 0 0  491  94 1 0 0 4 -18 324 6.196444  0 0
       368 1 10 32 1 0 0 0 0 0 0 0 0 1 0 1  670  73 0 0 0 1 -28 784 6.507277  0 0
      3455 1 13 25 0 0 0 0 0 0 1 0 0 0 1 1  617  95 1 0 1 7 -18 324 6.424869  0 0
       147 1 16 28 0 1 0 0 0 0 0 0 0 1 0 1  788 103 1 0 0 2 -18 324 6.669498  0 0
      end
      Code:
      .
      . reg lw educ pwe pwe2 female female_educ ib(none).region hisp union married services female_hisp part_time, nocon
      note: part_time omitted because of collinearity
      
            Source |       SS           df       MS      Number of obs   =     3,000
      -------------+----------------------------------   F(19, 2981)     =  48897.71
             Model |  117928.977        19  6206.78825   Prob > F        =    0.0000
          Residual |  378.390633     2,981  .126934127   R-squared       =    0.9968
      -------------+----------------------------------   Adj R-squared   =    0.9968
             Total |  118307.367     3,000  39.4357891   Root MSE        =    .35628
      
      ------------------------------------------------------------------------------
                lw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              educ |   .0762595   .0034969    21.81   0.000     .0694028    .0831162
               pwe |  -.1124042   .0143021    -7.86   0.000    -.1404472   -.0843611
              pwe2 |  -.0017198   .0003231    -5.32   0.000    -.0023533   -.0010864
            female |   .2465791   .1817885     1.36   0.175    -.1098645    .6030228
       female_educ |  -.0260866   .0121901    -2.14   0.032    -.0499885   -.0021847
                   |
            region |
                1  |   3.400629   .1730487    19.65   0.000     3.061322    3.739936
                2  |   3.529841   .1705954    20.69   0.000     3.195344    3.864338
                3  |   3.583971   .1709269    20.97   0.000     3.248825    3.919118
                4  |   3.501226   .1713533    20.43   0.000     3.165243    3.837209
                5  |   3.486608   .1713174    20.35   0.000     3.150695     3.82252
                6  |   3.523101   .1713432    20.56   0.000     3.187138    3.859064
                7  |    3.46762   .1725126    20.10   0.000     3.129364    3.805875
                8  |   3.342965   .1749211    19.11   0.000     2.999987    3.685944
                9  |   3.585189   .1716193    20.89   0.000     3.248685    3.921694
                   |
              hisp |  -.1602946   .0185927    -8.62   0.000    -.1967505   -.1238387
             union |   .1888312   .0152923    12.35   0.000     .1588467    .2188157
           married |   .1602008   .0148949    10.76   0.000     .1309954    .1894061
          services |  -.0795046   .0252685    -3.15   0.002      -.12905   -.0299592
       female_hisp |   .1371725   .0580395     2.36   0.018     .0233709    .2509741
         part_time |          0  (omitted)
      ------------------------------------------------------------------------------
      Part_time is ommited, as one sees

      Comment


      • #4
        Paul:
        your model has serious quasi-extreme multicollinearity issue (as it is also implicitly highlighgted by your sky-rocketing R-sq):
        Code:
        . estat vif, uncentered
        
            Variable |       VIF       1/VIF 
        -------------+----------------------
                educ |     69.61    0.014366
                 pwe |   2969.67    0.000337
                pwe2 |    819.65    0.001220
              female |     77.61    0.012885
         female_educ |     78.61    0.012720
              region |
                  1  |     60.39    0.016558
                  2  |    130.30    0.007675
                  3  |    180.23    0.005549
                  4  |     53.31    0.018759
                  5  |    199.58    0.005011
                  6  |     51.67    0.019355
                  7  |     98.67    0.010135
                  9  |    104.31    0.009587
                hisp |      2.04    0.489131
               union |      4.26    0.234881
             married |      4.31    0.232039
            services |     21.99    0.045482
        -------------+----------------------
            Mean VIF |    289.78
        You should probably think about a different specification.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Here is another specification, using regions as binary, not categorical variables and without constant. An issue with high multicollinearity is not of a concern. What i need is to figure out if it is possible to do a regression and still include part_time as an explanatory variable, but thank you anyways for taking a look at it.

          Code:
          . reg lw educ pwe pwe2 female female_educ reg661-reg669 hisp union married services female_hisp part_time, robust
          note: reg661 omitted because of collinearity
          note: reg667 omitted because of collinearity
          
          Linear regression                               Number of obs     =      3,000
                                                          F(18, 2981)       =      90.22
                                                          Prob > F          =     0.0000
                                                          R-squared         =     0.3438
                                                          Root MSE          =     .35628
          
          ------------------------------------------------------------------------------
                       |               Robust
                    lw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                  educ |   .0762595   .0035812    21.29   0.000     .0692375    .0832815
                   pwe |  -.1124042   .0143322    -7.84   0.000    -.1405061   -.0843022
                  pwe2 |  -.0017198   .0003233    -5.32   0.000    -.0023537    -.001086
                female |   .2465791   .1922923     1.28   0.200    -.1304599    .6236182
           female_educ |  -.0260866   .0136013    -1.92   0.055    -.0527555    .0005823
                reg661 |          0  (omitted)
                reg662 |   .1292123   .0338639     3.82   0.000     .0628134    .1956113
                reg663 |   .1833427   .0331685     5.53   0.000     .1183073    .2483781
                reg664 |   .1005972   .0386748     2.60   0.009     .0247653    .1764291
                reg665 |   .0189879   .0242325     0.78   0.433    -.0285263    .0665021
                reg666 |   .0554813   .0299734     1.85   0.064    -.0032894     .114252
                reg667 |          0  (omitted)
                reg668 |  -.0576635   .0481434    -1.20   0.231     -.152061    .0367341
                reg669 |   .1845606   .0376765     4.90   0.000     .1106859    .2584352
                  hisp |  -.1602946   .0187619    -8.54   0.000    -.1970822    -.123507
                 union |   .1888312   .0145794    12.95   0.000     .1602445    .2174179
               married |   .1602008   .0154207    10.39   0.000     .1299645    .1904371
              services |  -.0795046   .0269647    -2.95   0.003    -.1323759   -.0266333
           female_hisp |   .1371725   .0475706     2.88   0.004     .0438979    .2304471
             part_time |   -.066991   .0442425    -1.51   0.130      -.15374     .019758
                 _cons |    3.46762    .174092    19.92   0.000     3.126267    3.808972
          ------------------------------------------------------------------------------

          Comment


          • #6
            If you cross-tabulate part_time and region you will see:

            Code:
                       |                                         region
             part_time |         1          2          3          4          5          6          7          9 |     Total
            -----------+----------------------------------------------------------------------------------------+----------
                     0 |         0          0          0          0         22          6         11          0 |        39 
                     1 |         7         15         21          6          0          0          0         12 |        61 
            -----------+----------------------------------------------------------------------------------------+----------
                 Total |         7         15         21          6         22          6         11         12 |       100
            In other words, all of your part_time observations come from regions 1 through 4 and 9, and all of your non-part-time come from 5 through 7. So, mathematically, your part_time variable is actually an indicator ("dummy") for a subset of your regions. In that case, it is, inevitably, colinear with the region indicators themselves.

            Consequently, it is mathematically impossible to include all of the regions (or even all of the regions leaving out one as a reference category) and include a subset indicator as well. So you can keep i.part_time in your model if you like, but you will have to sacrifice a region indicator to do that. If you re-arrange the order of the variables in your regression so that i.part_time comes before i.region, then part_time will be retained, and Stata will drop an additional region.

            Comment


            • #7
              Thank you all for your replies and time consumed, very very helpful

              Comment

              Working...
              X