Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why are the variables omitted?

    Hi guys, I just have the following problem. I am trying to learn as much as possible about Data Science topics. Right now I'm learning pandemonium data and FE and RE. For this I have downloaded a panel dataset from the net and started to estimate. I use the following "guide" for this; https://www.princeton.edu/~otorres/Panel101.pdf.

    Now I have the following regression equation. reg lnwage union educ exp i.year, Now a dummy variable is automatically removed (1980), furthermore the variable 1987 and the variable "educ". But if I now estimate the model without "exp", then only education is omitted. Why is it that two variables are removed once, but only once the other time? I know it says 'because of collinearity' but i dont understand where it comes from. Thank you!
    Click image for larger version

Name:	Screenshot 2022-06-16 at 11.47.53.png
Views:	1
Size:	208.1 KB
ID:	1669442

    Click image for larger version

Name:	Screenshot 2022-06-16 at 11.48.09.png
Views:	1
Size:	225.2 KB
ID:	1669443


  • #2
    Carl:
    1) as per FAQ, please do not post screenshots but share what you typed and what Stata gave you back via CODE delimiters. Thanks;
    2) the -fe- estimator wipes out all time-invariant variables. -education-, if stable within panels, is a case in point;
    3) one year is omitted to avoid the so-called dummy-trap (
    https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
    ).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Carl:
      1) as per FAQ, please do not post screenshots but share what you typed and what Stata gave you back via CODE delimiters. Thanks;
      2) the -fe- estimator wipes out all time-invariant variables. -education-, if stable within panels, is a case in point;
      3) one year is omitted to avoid the so-called dummy-trap (
      https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
      ).
      Thanks for the answer. I will look out for it next time. I understand the first point, but I'm still a bit confused about the second. The panel data set contains data from 1980-1987, so yes one of the dummies has already been removed (1980; see picture). Why then was 1987 also removed? Normally only one of the dummy variables is removed.

      Comment


      • #4
        Carl:
        perfect collinearity with -exp- might be an answer.
        You may want to delve into the issue by typing:
        Code:
        estat vce, corr
        after -xtreg,fe-.
        In addition:
        a) if -exp- is some form of experience, you may want to search for possibe turning point:
        Code:
        c.exp##c.exp
        b) you within R-sq seems a tad low. Are you sure that you include all the necessary predictors in the right-hand side of your regression equation?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Carl:
          perfect collinearity with -exp- might be an answer.
          You may want to delve into the issue by typing:
          Code:
          estat vce, corr
          after -xtreg,fe-.
          In addition:
          a) if -exp- is some form of experience, you may want to search for possibe turning point:
          Code:
          c.exp##c.exp
          b) you within R-sq seems a tad low. Are you sure that you include all the necessary predictors in the right-hand side of your regression equation?
          Thanks i will think about dropping a few. Just one last question, for a better understanding.
          Why is the education and 1987 no longer omitted when I estimate with random effects?
          Code:
          xtreg lnwage union educ exp i.year, re r

          Comment


          • #6
            Carl:
            because -re- can also give back the coefficients of time-invariant variables.
            You question, if I may, suggests to get a better understanding of the theoretical building blocks of panel data regression (that is not a trivial stuff).
            A relevant number of references is reported in -xtreg- entry, Stata .pdf manual.
            Statalisters are usually fond of https://www.stata.com/bookstore/micr...metrics-stata/.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Originally posted by Carlo Lazzaro View Post
              Carl:
              perfect collinearity with -exp- might be an answer.
              You may want to delve into the issue by typing:
              Code:
              estat vce, corr
              I used your command, but unfortunately it does not show me the correlation of the omitted variables. Is there any way to show the correlation even if the variable is removed from the model?

              Comment


              • #8
                Carl:
                admittedly, in my previous post I was not that clear.
                The coefficient of the omitted variables cannot appear in the vce Matrix.
                The idea was to investigate whether quasi-extreme multicollinearity issues exist with the remaining predictors, so to figure out an alternate strategy of analysis.
                That said, reading your posts once more, I think that there is a more compelling issue to take into account with your code: you are likely to have latent variable-led endogeneity due to the fact that individual ability (which is embedded in residuals) has a bearing on both -education- (on average, other things being equal, smarter people achieve higher education levels) and regressand ((on average, other things being equal, smarter people achieve higher wage levels).
                If you stick with the -fe- specification and assume that individual ability is time-invariant (a quite strong assumption, as individual ability is a mix of innate talents and on-the-job training) the -fe- estimator will accomodate this issue, whereas the -re- estimator will not.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Carl:
                  admittedly, in my previous post I was not that clear.
                  The coefficient of the omitted variables cannot appear in the vce Matrix.
                  The idea was to investigate whether quasi-extreme multicollinearity issues exist with the remaining predictors, so to figure out an alternate strategy of analysis.
                  That said, reading your posts once more, I think that there is a more compelling issue to take into account with your code: you are likely to have latent variable-led endogeneity due to the fact that individual ability (which is embedded in residuals) has a bearing on both -education- (on average, other things being equal, smarter people achieve higher education levels) and regressand ((on average, other things being equal, smarter people achieve higher wage levels).
                  If you stick with the -fe- specification and assume that individual ability is time-invariant (a quite strong assumption, as individual ability is a mix of innate talents and on-the-job training) the -fe- estimator will accomodate this issue, whereas the -re- estimator will not.
                  Thank you Carlo. One final question. Is there a certain threshold above which Stata automatically removes variables from the model?

                  Comment


                  • #10
                    Carl:
                    Stata remove variables when they are perfectly collinear (that is, there's no way to disentagle their specific contribution to variation in the regressand when adjusted for the other predictors).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      Carl:
                      Stata remove variables when they are perfectly collinear (that is, there's no way to disentagle their specific contribution to variation in the regressand when adjusted for the other predictors).
                      Hi Carlo, Sorry to "bug" you again but one thing just keeps bothering me and irritating me. Why is there a problem of multicollinearity with the fixed-effects estimator, but not with the normal regression? Both methods use the same independent variables. What could be the reason for this? Could it be due to the transformation of the variables on which the fixed-effects estimator is based? Could this transformation be the reason for the multicollinearity?

                      Comment


                      • #12
                        Carl:
                        it depends on how you coded up your OLS.
                        As you can see in the following toy-example, the values of the shared coefficients between -regress. and -xtreg,fe- are identical (and the omitted variables/levels too):
                        Code:
                        use "https://www.stata-press.com/data/r17/nlswork.dta"
                        . regress ln_wage i.race i.year i.idcode if idcode<=3, vce(cluster idcode)
                        note: 2.race omitted because of collinearity.
                        
                        Linear regression                               Number of obs     =         39
                                                                        F(1, 2)           =          .
                                                                        Prob > F          =          .
                                                                        R-squared         =     0.6736
                                                                        Root MSE          =     .27711
                        
                                                         (Std. err. adjusted for 3 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                race |
                              Black  |          0  (omitted)
                                     |
                                year |
                                 69  |    .208967          .        .       .            .           .
                                 70  |  -.2747772   .2665627    -1.03   0.411    -1.421704    .8721495
                                 71  |  -.3613911   .3802231    -0.95   0.442    -1.997359    1.274577
                                 72  |  -.2056973   .2055158    -1.00   0.422     -1.08996    .6785657
                                 73  |  -.0310461   .1010676    -0.31   0.788    -.4659047    .4038125
                                 75  |   .0416271   .1645216     0.25   0.824    -.6662522    .7495064
                                 77  |   .0358937   .1361656     0.26   0.817    -.5499794    .6217669
                                 78  |   .2433199   .1991388     1.22   0.346    -.6135051    1.100145
                                 80  |   .2726139    .219896     1.24   0.341    -.6735221     1.21875
                                 82  |   .1747839   .0801197     2.18   0.161    -.1699433    .5195112
                                 83  |   .2924489   .1355079     2.16   0.164    -.2905946    .8754925
                                 85  |   .3712589   .1931145     1.92   0.194     -.459646    1.202164
                                 87  |   .2960361   .2135556     1.39   0.300    -.6228196    1.214892
                                 88  |   .3038639   .1527355     1.99   0.185    -.3533039    .9610317
                                     |
                              idcode |
                                  2  |  -.3898423   .0268011   -14.55   0.005    -.5051583   -.2745263
                                  3  |  -.4648596   .0066766   -69.62   0.000    -.4935868   -.4361323
                                     |
                               _cons |   1.958421   .0066766   293.32   0.000     1.929694    1.987148
                        ------------------------------------------------------------------------------
                        
                        . mat list e(b)
                        
                        e(b)[1,20]
                                    2o.        68b.         69.         70.         71.         72.         73.         75.         77.         78.         80.
                                  race        year        year        year        year        year        year        year        year        year        year
                        y1           0           0   .20896697  -.27477721  -.36139112  -.20569731  -.03104612   .04162712   .03589375   .24331994   .27261391
                        
                                    82.         83.         85.         87.         88.         1b.          2.          3.            
                                  year        year        year        year        year      idcode      idcode      idcode       _cons
                        y1   .17478391   .29244895   .37125888   .29603611   .30386391           0  -.38984227  -.46485956   1.9584209
                        
                        . 
                        . xtreg ln_wage i.race i.year if idcode<=3, fe vce(cluster idcode)
                        note: 2.race omitted because of collinearity.
                        
                        Fixed-effects (within) regression               Number of obs     =         39
                        Group variable: idcode                          Number of groups  =          3
                        
                        R-squared:                                      Obs per group:
                             Within  = 0.5446                                         min =         12
                             Between = 0.2670                                         avg =       13.0
                             Overall = 0.3678                                         max =         15
                        
                                                                        F(3,2)            =          .
                        corr(u_i, Xb) = -0.0356                         Prob > F          =          .
                        
                                                         (Std. err. adjusted for 3 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                race |
                              Black  |          0  (omitted)
                                     |
                                year |
                                 69  |    .208967   3.41e-08  6.1e+06   0.000     .2089668    .2089671
                                 70  |  -.2747772   .2552143    -1.08   0.394    -1.372876    .8233215
                                 71  |  -.3613911   .3640359    -0.99   0.425    -1.927711    1.204929
                                 72  |  -.2056973   .1967664    -1.05   0.406    -1.052315      .64092
                                 73  |  -.0310461   .0967648    -0.32   0.779    -.4473915    .3852993
                                 75  |   .0416271   .1575174     0.26   0.816    -.6361157      .71937
                                 77  |   .0358937   .1303686     0.28   0.809    -.5250371    .5968246
                                 78  |   .2433199   .1906609     1.28   0.330    -.5770276    1.063667
                                 80  |   .2726139   .2105344     1.29   0.325    -.6332423     1.17847
                                 82  |   .1747839   .0767088     2.28   0.150    -.1552673    .5048351
                                 83  |   .2924489    .129739     2.25   0.153    -.2657727    .8506706
                                 85  |   .3712589   .1848931     2.01   0.182    -.4242719     1.16679
                                 87  |   .2960361   .2044639     1.45   0.285    -.5837012    1.175773
                                 88  |   .3038639   .1462331     2.08   0.173    -.3253264    .9330542
                                     |
                               _cons |   1.659677   .0055719   297.86   0.000     1.635703    1.683651
                        -------------+----------------------------------------------------------------
                             sigma_u |  .24956596
                             sigma_e |  .27711004
                                 rho |  .44784468   (fraction of variance due to u_i)
                        ------------------------------------------------------------------------------
                        
                        . mat list e(b)
                        
                        e(b)[1,17]
                                    2o.        68b.         69.         70.         71.         72.         73.         75.         77.         78.         80.
                                  race        year        year        year        year        year        year        year        year        year        year
                        y1           0           0   .20896697  -.27477721  -.36139112  -.20569731  -.03104612   .04162712   .03589375   .24331994   .27261391
                        
                                    82.         83.         85.         87.         88.            
                                  year        year        year        year        year       _cons
                        y1   .17478391   .29244895   .37125888   .29603611   .30386391   1.6596773
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Carl:
                          Thanks, this is my OLS-Code.

                          Code:
                          reg lnwage union educ exp i.year, robust
                          And thats my FE-code:
                          Code:
                          xtreg lnwage union educ exp i.year, fe robust

                          Comment


                          • #14
                            Carl:
                            your codes should have been:
                            Code:
                            reg ln_wage i,union educ exp i.idcode i.year, vce(cluster idcode)
                            xtset idcode year
                            xtreg ln_wage i.union educ exp i.year, fe vce(cluster idcode)
                            Please note that, while -robust- and -vce(cluster idcode)- can be used interchangeably with -xtreg-, this does not hold for -regress-.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Originally posted by Carlo Lazzaro View Post
                              Carl:
                              your codes should have been:
                              Code:
                              reg ln_wage i,union educ exp i.idcode i.year, vce(cluster idcode)
                              xtset idcode year
                              xtreg ln_wage i.union educ exp i.year, fe vce(cluster idcode)
                              Please note that, while -robust- and -vce(cluster idcode)- can be used interchangeably with -xtreg-, this does not hold for -regress-.
                              Hi Carlo, thank you very much, did the trick.
                              I just remain curious as to why the variable before it was removed. Do you have a final explanation for this. Could it be because the FE transformation created a perfect multicollinearity between the year dummies and the experience? And since STATA by default removes the last variable of the command, 1987 was then removed here. Would

                              Comment

                              Working...
                              X