Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS v. Fixed Effects results reliability

    I am running into the following thing that has made me curious:

    I am running a panel regression about the impact of board committee characteristics on carbon performance. I was advised to first run an OLS regression to get an overview before doing a fixed/random effects regression, so I did. I have then decided for FE model over the RE model.

    The results are the following: For the OLS model I get an extremely high adjusted R-squared of ~75%. A lot of this is due to industry fixed effects I have in my model, without them the adjusted R-squared drops to ~42%. Also, in the OLS, around half of my predictor variables are significant.
    On the other hand, in the FE model where the industry fixed-effects are omitted, due to being time-invariant, I only get an adjusted R-squared of ~22% and less of my variables are statistically significant.

    Here are the OLS results:

    Code:
     reg EMTOTAL NOMCOMM NOMCOMM_IND COMPCOMM COMPCOMM_IND AUDCOMM AUDCOMM_IND GOVCOMM ATT SUSCOMM BSIZE BGD INC INDEP DUAL ROA LEV FSIZE MULT SKILLS i.YEAR i.INDUSTRY, vce(cluster ID)
    
    Linear regression                               Number of obs     =      2,546
                                                    F(75, 388)        =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.7579
                                                    Root MSE          =     1.1201
    
                                       (Std. err. adjusted for 389 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
         EMTOTAL | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         NOMCOMM |    .501477   .5371908     0.93   0.351    -.5546921    1.557646
     NOMCOMM_IND |  -.0049362   .2602447    -0.02   0.985    -.5166026    .5067301
        COMPCOMM |  -.1953639   .2510876    -0.78   0.437    -.6890264    .2982987
    COMPCOMM_IND |   1.507128    .893569     1.69   0.092    -.2497146    3.263972
         AUDCOMM |  -.4016043   1.053488    -0.38   0.703    -2.472865    1.669656
     AUDCOMM_IND |  -2.396646    .913957    -2.62   0.009    -4.193574   -.5997178
         GOVCOMM |   .2469164   .2710377     0.91   0.363    -.2859699    .7798027
             ATT |   .0151676   .0049427     3.07   0.002     .0054498    .0248855
         SUSCOMM |   .2270284   .1260847     1.80   0.073    -.0208663    .4749231
           BSIZE |  -.1703795   .3331368    -0.51   0.609    -.8253587    .4845997
             BGD |  -.0110272   .0070541    -1.56   0.119    -.0248962    .0028419
             INC |   .2015798   .0960159     2.10   0.036     .0128031    .3903564
           INDEP |   .0002575   .0068561     0.04   0.970    -.0132223    .0137373
            DUAL |   -.009173   .1303307    -0.07   0.944    -.2654158    .2470697
             ROA |  -.1805006   .0730591    -2.47   0.014    -.3241418   -.0368593
             LEV |   .3685102    .296991     1.24   0.215    -.2154028    .9524232
           FSIZE |   .7882242   .0731855    10.77   0.000     .6443344    .9321139
            MULT |  -.0323264   .1311898    -0.25   0.805    -.2902582    .2256055
          SKILLS |  -.0026919   .0026403    -1.02   0.309    -.0078829    .0024992



    These are the FE results:

    Code:
    . xtreg EMTOTAL NOMCOMM NOMCOMM_IND COMPCOMM COMPCOMM_IND AUDCOMM AUDCOMM_IND GOVCOMM ATT SUSCOMM BSIZE BGD INC INDEP DUAL ROA LEV FSIZE MULT SKILLS i.YEAR, fe vce(cluster ID)
    
    Fixed-effects (within) regression               Number of obs     =      2,549
    Group variable: ID                              Number of groups  =        390
    
    R-squared:                                      Obs per group:
         Within  = 0.2249                                         min =          1
         Between = 0.3603                                         avg =        6.5
         Overall = 0.3284                                         max =         13
    
                                                    F(30, 389)        =          .
    corr(u_i, Xb) = 0.3480                          Prob > F          =          .
    
                                       (Std. err. adjusted for 390 clusters in ID)
    ------------------------------------------------------------------------------
                 |               Robust
         EMTOTAL | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
         NOMCOMM |   .2934868   .2109382     1.39   0.165    -.1212349    .7082085
     NOMCOMM_IND |  -.1267084   .0921178    -1.38   0.170    -.3078194    .0544026
        COMPCOMM |   .0854662   .1225279     0.70   0.486    -.1554336    .3263661
    COMPCOMM_IND |   .9833365   .4859865     2.02   0.044     .0278475    1.938825
         AUDCOMM |  -.1399784   .4012153    -0.35   0.727    -.9288003    .6488434
     AUDCOMM_IND |   -1.16394   .2971362    -3.92   0.000    -1.748134   -.5797461
         GOVCOMM |   .1810519   .1828703     0.99   0.323    -.1784858    .5405896
             ATT |  -.0001274   .0017421    -0.07   0.942    -.0035525    .0032978
         SUSCOMM |   .0207709   .0453892     0.46   0.647    -.0684679    .1100098
           BSIZE |  -.0695785   .1128549    -0.62   0.538    -.2914604    .1523034
             BGD |  -.0004053   .0023553    -0.17   0.863     -.005036    .0042255
             INC |   .0135879   .0255077     0.53   0.595    -.0365623    .0637381
           INDEP |   -.004724   .0034665    -1.36   0.174    -.0115395    .0020914
            DUAL |  -.0351191   .0541888    -0.65   0.517    -.1416586    .0714204
             ROA |  -.0077768   .0202317    -0.38   0.701     -.047554    .0320004
             LEV |  -.2681079   .2017146    -1.33   0.185    -.6646951    .1284792
           FSIZE |   .5235958   .0943601     5.55   0.000     .3380761    .7091155
            MULT |   .2038137    .089293     2.28   0.023     .0282564    .3793711
          SKILLS |   .0004215   .0011727     0.36   0.719    -.0018842    .0027272

    Now what has made me curious, in this case, which results can be considered more reliable/useful? I have also plotted the residuals of both models against the dependent variable to visually assess which model does a better job at predicting the dependent variable, in this case the OLS seems to do so. I am also aware that OLS and FE do not measure exactly the same thing (within variation reported by FE while OLS , but I would still like to hear opinions on these results as I have relatively little experience overall.

    Thanks a lot in advance!
    Last edited by Constantin Domizlaff; 09 Mar 2024, 10:15. Reason: OLS

  • #2
    I am also aware that OLS and FE do not measure exactly the same thing (within variation reported by FE while OLS
    This is the key, except you have understated it here. Some of the coefficient differences between FE and OLS here are huge, like an order of magnitude! And a few are of opposite signs. You have data in which the within and between ID effects are very, very different.

    The FE model gives you consistent estimates of the within-ID effects of these variables. The OLS model estimates a weighted average of the within and between effects. What you need to do is think clearly about your research question. Does it ask about the within effects? Or does it ask about the between effects? Or both? If the research question is about the within-ID effects, then the OLS model is irrelevant.

    If it asks specifically about the between effects, then the OLS model gives an approximation to those, but given how different the within and between effects are, a weighted average of the two might not really be a very good approximation to the pure between effects. So if you need the between-ID effects (either alone or you need both) then I suggest running a Mundlak correlated random effects model. The simplest way to do that is with the -xthybrid- command, available from SSC.

    Comment


    • #3
      Other fields view things differently. The key is causality. FE allows the firm heterogeneity to be arbitrarily correlated with the x's, and that's why it's generally preferred to POLS. I wouldn't do any comparison, though, without time effects. A full set of year dummies is almost always included in these kinds of applications.

      How many industries do you have? I can point you to a recent paper of mine that can be used to test whether it is sufficient to use industry FEs versus firm FEs. Some of the large differences in the POLS versus FE estimates are on insignificant variables. The differences may be even smaller if you control for time effects.

      Here's a link to the paper:

      Papke-Wooldridge

      Comment


      • #4
        Thank you for the replies Clyde and Prof. Wooldridge.

        @Prof. Wooldridge thank you for the link to the paper.

        Some of the large differences in the POLS versus FE estimates are on insignificant variables. The differences may be even smaller if you control for time effects.
        I have included time fixed effects dummies in my both the OLS and the FE regression (i.YEAR), as seen above. Does this make a comparison meaningful or should I, based on the answer from Clyde, report only the FE results, given my research question will be something of the likes "Do Board Committee characteristics XYZ have a significant impact on corporate carbon performance?" which indicates an interest in the within-panel effects, if I am not mistaken.

        How many industries do you have?
        In terms of industries, there are a lot, approx. 60 different industries.

        Thanks a lot in advance!

        Comment


        • #5
          Hey everybody, I really dont want to bother anyone, but it would be great to get a closing thought from someone on my last comment. Thanks a lot!

          Comment


          • #6
            given my research question will be something of the likes "Do Board Committee characteristics XYZ have a significant impact on corporate carbon performance?
            is, in my opinion, not precise enough to give you guidance.

            It could mean: do those firms whose board committees happen to have characteristics XYZ have different levels of carbon performance than those without characteristics XYZ?

            Or it could mean: does a firm whose board committee changes to gain characteristics XYZ have different levels of carbon performance than it did when it didn't have those characteristics?

            The former is a between-firms question and the latter is a within-firms question. They may well have different answers, and, in any case, getting the right answer to the wrong question does not advance your goals. From the econometrician's perspective, the latter question would be closer to causal than the former, which is more about correlation than causation--no disagreement across fields about that. From my epidemiological perspective, I would say that even with a two-way fixed effects (TWFE) model, this is still observational data and although TWFE does eliminate much of the concern about confounding by unmeasured variables, it still has gaps and may or may not truly give a causal effect estimate, as the change in characteristics itself may be endogenous in ways that are not overcome by TWFE. And, again, not knowing what your research goals are, it may be that the non-causal between-firms effects are the ones you need. That's a decision you need to make by clarifying exactly what your research question is.

            Comment


            • #7
              Again, to me it’s about causality. Using OLS is more likely to lead to spurious causality. If you settle on POLS then you could just as well use a single cross section. The choices of board members could easily be related to carbon performance. In my view, your only hope for determining causality is using FE — and even that’s in doubt.

              Comment


              • #8
                @Clyde my research question would indeed lead to the latter question you mentioned.

                And I am also aware of endogeneity concerns, such as that the choices of committees/board members could be related to carbon performance. To address this issue I have thought about also running a GMM model to address this after the FE.

                But your answers have helped my a lot. This will surely help me in my further research process! Thanks again!

                Comment

                Working...
                X