Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking for Multicollinearity in FE-Model

    Hello,

    if I use my pooled regression I just use the variance inflation factor.(VIF) to check for multicollinearity.
    1) But how can I check for it, if I use the fixed effects model?

    2) Can you also tell me if I can always reduce multicollinearity when introducing an interaction term (for two variables which depends on each other) or is it possible that this integration increases multicollinearity?
    3) Is it still possible to use the VIF to check for multicollinearity in the pooled regression if an interaction term is included??

    I would be grateful to get a response.

    Thanks, Lisa

  • #2
    lisa:
    1) you can take a look at -estat vce-;
    Code:
    . xtreg ln_wage c.tenure##i.race, vce(cluster idcode)
    
    Random-effects GLS regression                   Number of obs      =     28101
    Group variable: idcode                          Number of groups   =      4699
    
    R-sq:  within  = 0.0976                         Obs per group: min =         1
           between = 0.2080                                        avg =       6.0
           overall = 0.1573                                        max =        15
    
                                                    Wald chi2(5)       =   1807.62
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
    
                                   (Std. Err. adjusted for 4699 clusters in idcode)
    -------------------------------------------------------------------------------
                  |               Robust
          ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
           tenure |   .0389736    .001159    33.63   0.000     .0367021    .0412451
                  |
             race |
           black  |  -.1220068   .0123803    -9.85   0.000    -.1462717   -.0977419
           other  |   .1059327   .0590475     1.79   0.073    -.0097983    .2216637
                  |
    race#c.tenure |
           black  |  -.0045011    .001981    -2.27   0.023    -.0083837   -.0006185
           other  |  -.0006747   .0077339    -0.09   0.930    -.0158328    .0144833
                  |
            _cons |    1.58904   .0069059   230.10   0.000     1.575505    1.602576
    --------------+----------------------------------------------------------------
          sigma_u |   .3362815
          sigma_e |  .30352533
              rho |  .55106313   (fraction of variance due to u_i)
    -------------------------------------------------------------------------------
    
    . estat vce, corr
    
    Correlation matrix of coefficients of xtreg model
    
                 |                  2.        3.   2.race#   3.race#          
            e(V) |   tenure      race      race  c.tenure  c.tenure     _cons 
    -------------+------------------------------------------------------------
          tenure |   1.0000                                                   
          2.race |   0.2189    1.0000                                         
          3.race |   0.0459    0.0652    1.0000                               
          2.race#|                                                            
        c.tenure |  -0.5850   -0.2796   -0.0268    1.0000                     
          3.race#|                                                            
        c.tenure |  -0.1499   -0.0328   -0.0100    0.0877    1.0000           
           _cons |  -0.3924   -0.5578   -0.1170    0.2295    0.0588    1.0000 
    
    .
    2) in order to reduce the risk of multicollinearity if an interaction is to be included within the set of predictors, you can center te vaiables around a maningful value (e.g., their sample mean);
    3) yes, as you can see from the following toy-example:
    Code:
    . reg ln_wage c.tenure##i.race, vce(cluster idcode)
    
    Linear regression                                      Number of obs =   28101
                                                           F(  5,  4698) =  323.74
                                                           Prob > F      =  0.0000
                                                           R-squared     =  0.1578
                                                           Root MSE      =  .43851
    
                                   (Std. Err. adjusted for 4699 clusters in idcode)
    -------------------------------------------------------------------------------
                  |               Robust
          ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------+----------------------------------------------------------------
           tenure |    .049002   .0014961    32.75   0.000      .046069    .0519351
                  |
             race |
           black  |  -.1302252   .0135407    -9.62   0.000    -.1567713   -.1036792
           other  |   .0678536   .0563184     1.20   0.228    -.0425568    .1782641
                  |
    race#c.tenure |
           black  |  -.0055268   .0026885    -2.06   0.040    -.0107976   -.0002561
           other  |   .0066888   .0150236     0.45   0.656    -.0227645     .036142
                  |
            _cons |   1.564889   .0072358   216.27   0.000     1.550704    1.579075
    -------------------------------------------------------------------------------
    
    . estat vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
          tenure |      1.43    0.697843
            race |
              2  |      1.70    0.587044
              3  |      1.79    0.558145
            race#|
        c.tenure |
              2  |      2.13    0.468795
              3  |      1.80    0.556288
    -------------+----------------------
        Mean VIF |      1.77
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hello Carlo,

      thank you very much for your answer.
      I was told that If you use the vif-command, the rule of thumb is 10.
      Is there such a rule of thumb for the correlation (estat vce, corr)?

      Thank you very much!
      Lisa

      Comment


      • #4
        Lisa:
        Paul Allison (https://uk.sagepub.com/en-gb/eur/mul...ssion/book8989) at page 141 suggests 0.6 correlation as a "let's start-worrying-about-multicollinearity-threshold".

        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you very much Carlo,
          I just have one more question according to the interaction variables.

          My Regression: (Age1 is the centralized value (Age-Mean Age)
          Code:
          xtreg   deposit   income   age   c.age1#c.income  interest year1 year2 year3 , fe vce(cluster id)
          If I don't include age, income has a positive sign, if I include age or/and the interaction term, my income variable is negative.
          Also my correlation matrix (estat vce, corr) indicates that there should not be multi-collinearity any more.
          Do you know why there might be that kind of change?

          Thank you very much!
          Lisa

          Comment


          • #6
            Lisa:
            what if:
            Code:
             
             xtreg   deposit   income  c.age1##c.income  interest i.year, fe vce(cluster id)
            ?
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              It is kind of weird. If I don't include the year dummy income is positive (but insignificant). If I include the year dummy or the gdp or any other variable which controls as time effect, the variable gets a negative sign... any ideas ?

              Comment


              • #8
                Lisa:
                you might experience a sort of tipping effect (https://uk.sagepub.com/en-gb/eur/mul...ssion/book8989) at page 144 or http://statisticalhorizons.com/multicollinearity: see Paul Allison's reply to Arne Mastekaasa's comments.
                Put briefly, multicolinearity is still affecting your estimates, which appear sign-flipping and less stable (somebody would say robust, but I can't figure out with respect to what) than expected.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Thank you very much, Carlo.
                  For another regression I need to include another variable which also depends on income.
                  Is it allowed to use two interaction variables with income?
                  Do you know a good paper which tell how to interprete interaction variables in case of an log-log model? Cause in this case I am not sure how to interpret the marginal effect.

                  Thank you very much!!!
                  Lisa

                  Comment


                  • #10
                    Lisa:
                    I'm not familiar with two interactions for the same variable. Possibly, you can switch to three-level interaction (even though I find two-level interactions the ultimate request you can address to your readers).
                    Unfortunately, i'm not aware of any paper of the kind you're after (have you already googled about this topic?).
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      just to check, that I described my problem correctly, I want to ask again:
                      I need to introduce two interaction terms in order to get rid of the multi-collinearity problem:
                      c.age1##c.income and c.age1##c.accounts (accounts means the number of bank accounts)
                      So it is not two times the same interaction variable, just the variable age is included in both.
                      hopefully this time, I described it in a better way.

                      Kind regards,
                      Lisa

                      Comment


                      • #12
                        Originally posted by lisa bäcker View Post
                        Thank you very much Carlo,
                        I just have one more question according to the interaction variables.

                        My Regression: (Age1 is the centralized value (Age-Mean Age)
                        Code:
                        xtreg deposit income age c.age1#c.income interest year1 year2 year3 , fe vce(cluster id)
                        If I don't include age, income has a positive sign, if I include age or/and the interaction term, my income variable is negative.
                        Also my correlation matrix (estat vce, corr) indicates that there should not be multi-collinearity any more.
                        Do you know why there might be that kind of change?

                        Thank you very much!
                        Lisa
                        You clearly have something wrong in the above equation which causing confusion. you have 'age' and 'age1'. Stata clearly treating them two separate variable and 'age1' is being treated in the model without its lower order term. You need to correct that first. Either add 'age' + c.age#c.income or age1 + c.age1#income.
                        Roman

                        Comment


                        • #13
                          thank you Roman,
                          but I don't understand how to deal with your suggestion:
                          the variable age is of course the age of the customer,
                          but age1 is the centered variable (age-mean_age), so I need to define a new variable (age1) to tell stata to use the centered value, or?

                          Comment


                          • #14
                            Originally posted by lisa bäcker View Post
                            Thank you very much, Carlo.
                            For another regression I need to include another variable which also depends on income.
                            Is it allowed to use two interaction variables with income?
                            Do you know a good paper which tell how to interprete interaction variables in case of an log-log model? Cause in this case I am not sure how to interpret the marginal effect.

                            Thank you very much!!!
                            Lisa
                            UCLA page http://www.ats.ucla.edu/stat/mult_pk...regression.htm has some idea about interpretation of log models. You can have a look. But it perhaps won't tell you how to derive marginal effects at their geometric mean values (log models use geometric mean). However, you can always let 'margins' to do the work for you for the mean values. For example,

                            after your model, the following command will give you the expected geometric mean deposit values at different values of age :


                            Code:
                            margins, at(age=(15 (5) 45))) expr(exp(predict(xb)))


                            Roman

                            Comment


                            • #15
                              Lisa:
                              you should get rid of -age- as a predictor and focus on -age1- only, as Roman suggested.
                              I woud also investigate if there's a quadratic relationship between age and deposit (life-cycle theory?).
                              Kind regards,
                              Carlo
                              (StataNow 18.5)

                              Comment

                              Working...
                              X