Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multicollinearity in Lagged Independent Variables

    Hi, I'm doing a research for my thesis on the impact of environmental innovation to financial performance. Since it can take years for innovation to have impact on company's financial performance, I'm using lagged variables of the environmental innovation from t-1 up until t-5. I'm using panel data.


    The problem is, there is multicollinearity in my lagged variable. I'm wondering if it is alright to ignore the multicollinearity. Are there any approaches that should be taken?


    Here is the result to the multicollinearity test:
    Code:
    . reg ROA_w ENV_w ENV_lag1 ENV_lag2 ENV_lag3 ENV_lag4 ENV_lag5 RDI_w SIZE_w
    
          Source |       SS           df       MS      Number of obs   =       825
    -------------+----------------------------------   F(8, 816)       =     10.18
           Model |   657.26792         8    82.15849   Prob > F        =    0.0000
        Residual |  6583.71871       816  8.06828273   R-squared       =    0.0908
    -------------+----------------------------------   Adj R-squared   =    0.0819
           Total |  7240.98662       824  8.78760513   Root MSE        =    2.8405
    
    ------------------------------------------------------------------------------
           ROA_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           ENV_w |  -.0074454   .0105458    -0.71   0.480    -.0281456    .0132547
        ENV_lag1 |    .008536   .0137712     0.62   0.536    -.0184951    .0355672
        ENV_lag2 |  -.0115661   .0119388    -0.97   0.333    -.0350004    .0118682
        ENV_lag3 |   .0045303    .010918     0.41   0.678    -.0169003    .0259609
        ENV_lag4 |   .0092341   .0112023     0.82   0.410    -.0127546    .0312228
        ENV_lag5 |  -.0232335   .0085287    -2.72   0.007    -.0399743   -.0064927
           RDI_w |   24.61587   3.864417     6.37   0.000      17.0305    32.20124
          SIZE_w |  -9.44e-06   2.53e-06    -3.72   0.000    -.0000144   -4.46e-06
           _cons |   5.626047   .2895753    19.43   0.000     5.057647    6.194447
    ------------------------------------------------------------------------------
    
                
    
    .
    .  . vif
    
        Variable |       VIF       1/VIF  
    -------------+----------------------
        ENV_lag1 |     12.09    0.082687
        ENV_lag2 |      9.55    0.104716
        ENV_lag4 |      9.55    0.104716
        ENV_lag3 |      8.54    0.117050
           ENV_w |      6.95    0.143930
        ENV_lag5 |      5.75    0.173855
          SIZE_w |      1.04    0.964869
           RDI_w |      1.02    0.982548
    -------------+----------------------
        Mean VIF |      6.81

    Thank you!
    Last edited by Clarine Caine; 28 Apr 2021, 11:23.

  • #2
    Hi Clarine,

    You say you have panel data but you don't seem to be best utilising that, ie it looks like you are looking all data together. Is that the best approach? Perhaps you want to consider a panel approach such as fixed or random effects.
    At the very least, I would recommend using robust standard errors in your current specification (and clustered SEs if you use a panel approach).

    In terms of multicollinearity, the effect is to increase the standard errors if your estimates. It won't affect the coefficient estimates. You can get around this by introducing more data, but that probably isn't an option. This could explain the high standard errors in your first 4 lags although your 5th lag is significant. To be honest, I think it's to be expect that multicollinearity is high given the fact you're including lags.
    You could try to remove some earlier lags but I don't know exactly the hypothesis you are testing and what the literature does in this area.

    Best Rhys

    Comment


    • #3
      please type
      help xtset
      to set up your data as panel data it would be something like this:

      xtset panelvar timevar

      then you can use timeseries operators instead of creating lagged variables yourself. see
      help tsvarlist

      after doing those then your estimation command would look like
      xtreg ROA_w L(0/5).ENV RDI_w SIZE_w

      after doing all that if collinearity is still a problem, then you must either find more data (not likely a solution) or eliminate some of the lags (most likely solution)
      Last edited by Oscar Ozfidan; 28 Apr 2021, 20:09.

      Comment


      • #4
        Hi Rhys,

        I already did the Chow, Hausman, and Breusch Pagan test. The result shows that I need to use Fixed Effect Model. I only use the regression that I post in the first post to get the VIF number.

        Here is the result to my regression:

        Code:
        . xtreg ROA_w ENV_w ENV_lag1 ENV_lag2 ENV_lag3 ENV_lag4 ENV_lag5 RDI_w SIZE_w, fe cluster(ID)
        
        Fixed-effects (within) regression               Number of obs     =        825
        Group variable: ID                              Number of groups  =        275
        
        R-sq:                                           Obs per group:
             within  = 0.1100                                         min =          3
             between = 0.0001                                         avg =        3.0
             overall = 0.0000                                         max =          3
        
                                                        F(8,274)          =       4.13
        corr(u_i, Xb)  = -0.8966                        Prob > F          =     0.0001
        
                                           (Std. Err. adjusted for 275 clusters in ID)
        ------------------------------------------------------------------------------
                     |               Robust
               ROA_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
               ENV_w |  -.0120233   .0087011    -1.38   0.168    -.0291529    .0051062
            ENV_lag1 |  -.0054445    .007026    -0.77   0.439    -.0192762    .0083872
            ENV_lag2 |  -.0146844    .008402    -1.75   0.082     -.031225    .0018563
            ENV_lag3 |   .0084578    .005486     1.54   0.124    -.0023422    .0192578
            ENV_lag4 |   .0221236   .0079207     2.79   0.006     .0065305    .0377167
            ENV_lag5 |  -.0273902   .0089742    -3.05   0.002    -.0450572   -.0097231
               RDI_w |  -125.8473   36.56646    -3.44   0.001    -197.8342   -53.86042
              SIZE_w |  -.0000966   .0000443    -2.18   0.030    -.0001837   -9.40e-06
               _cons |    13.8214   2.315614     5.97   0.000     9.262746    18.38006
        -------------+----------------------------------------------------------------
             sigma_u |  6.0589826
             sigma_e |  1.5639711
                 rho |  .93753381   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------

        The results are actually showing similar results with the literature that I'm using. I'm just not sure if I could just ignore the multicollinearity problem or not.

        I don't think the lags can be eliminated because one of the main objective is to see if time lags matter in this issue.
        Since you mentioned that it is expected that multicollinearity is high given the fact that there are lags, does it mean that I can just leave it as it is?

        Comment


        • #5
          You can read the following to see if you should or not.
          https://statisticalhorizons.com/multicollinearity

          Comment


          • #6
            Thanks Oscar!

            Comment


            • #7
              I bet you can estimate the long run effect -- that is, the sum of the coefficients -- much more precisely. Our theories are often better about long-run relationships, anyway. It's always difficult to estimate the dynamic effects, but more data helps.

              I'm not a fan of even looking at the VIFs in these cases. I know I have a lot of multicollinearity. But I will say that the situation is almost certainly worse with fixed effects, because much of the variation in the explanatory variables is removed. You can see this by centering the data from the firm-specific time averages, using pooled OLS and then obtaining the VIFs. Like I said, I'm not sure what you learn will help.

              Comment

              Working...
              X