Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed and random effects changed the sign of coefficients

    Hi,
    I would be grateful for any help, please.

    I study the effect of being unemployed on health. I ran a pooled linear probability model using panel data, and the sign of the coefficient for my independent variable, "unemployed," was negative (as I hypothesized). However, when I ran fixed effect and random effect models, the sign of the "unemployed" coefficient became positive. The Hausman test indicated that the fixed effect model was more suitable for the data. I tried using the vce (robust) and vce (cluster id) options, but that did not change the positive sign in fixed effects.
    Should I consider the results of the pooled OLS and ignore that the fixed effects model shows a counterintuitive sign?

    Thank you in advance for any advice.

    Attached Files

  • #2
    There is nothing surprising about the coefficient of "the same" variable having opposing signs in fixed-effects and random-effects models or OLS. I put "the same" variable in scare-quotes because, although many people treat fixed-effects and random-effects models as just two alternative ways to analyze the same longitudinal (panel) data, and there are some technicalities about consistency and efficiency that can be resolved by a magical Hausman test to decide which to use, in fact they are very different models and they are, in general, not substitutes for each other. Getting trapped in this pseudo-paradox also arises because people are often inexact in thinking about what the research question is.

    A fixed-effects model estimates within-panel (person, in your case) effects only. That is, a fixed-effects model like the one you show estimates the effect of one person's changing unempoyment status on that same person's health. It answers the question: if a given person's health status changes, what happens to that person's health? (I'm ignoring issues of causal inference here and using simple language.)

    By contrast, one might ask, what is the difference in health status between people who are typically unemployed and people who are typically employed? This is a different question and the answer may well be different, even opposite in direction to the answer to the question posed in the fixed-effects model. Even this is not exactly the question that OLS and random-effects estimators answer. What they provide is actually a weighted average of this and the answer to the fixed-effects question.

    So you need to think carefully about which question you are asking. If you are asking about between-person effects, you must use OLS or random effects, even if Hausman says otherwise, because a fixed-effects model is mathematically incapable of estimating between-person effects. If you want within-person effects, then the fixed-effects model is the one that directly asks and answers that question. If Hausman says you can use random effects, it will always be the case that the within-person and between-person effects in that model and that data are actually the same (or very nearly so), and the random effects estimator will provide a more precise ("efficient") estimate.

    (And if your research design calls for estimating both within- and between-person effects, you can use the -xthybrid- command, available from SSC; it does both at the same time.)

    Comment


    • #3
      I agree with Clyde completely. Great explanation!

      To the very last point Clyde made in his post about xthybrid, which is a great package, you do not need to use it to get both effects. You can get both within- and between-person estimates in the random effects model by including the person means for all variables that vary within persons. For example:
      Code:
      webuse nlswork, clear
      xtset idcode
      foreach v of varlist age tenure {
          bysort idcode: egen pmn_`v' = mean(`v')
          }
      xtreg ln_w age tenure pmn_age pmn_tenure, re
      You can compare these to the fixed effect model estimates to see that the coefficients for age and tenure are exactly the same as in the random effects model above. Also, the coefficients on the pmn_* variables are a statistical test of whether the within-person association is different than the between-person association. Handy!
      Last edited by Erik Ruzek; 07 Feb 2024, 13:20. Reason: Added webuse nlswork to the code segment

      Comment


      • #4
        This is one of the occasions on which I have to disagree with how some non-economists think about fixed and random effects models. My perspective is consistent with Jeff Wooldridge's stance from #11 of https://www.statalist.org/forums/for...nel-regression:

        It's hard to think of cases where, if the FE estimates and RE estimates differed significantly on time-varying variables that I would place much value on the RE estimates. I don't think of it as a problem of "variation" -- that sounds like a technical issue that doesn't really have anything to do with establishing causality. If the goal is a descriptive regression then just use pooled OLS. That's arguably as descriptive as RE or the between regression. If you want to control for systematic, unmeasured differences across units (individuals, firms, schools, and so on) then FE is preferred. If the variables of interest don't vary enough over time to identify the effects then we might need a new problem or a new data set. Now, if FE is introducing unnecessary noise then RE can be preferred -- but that's an empirical matter.

        Another way to state it: How can any theory reliable conclude that unobserved heterogeneity is uncorrelated with the observed covariates? How could I ever be sure that, say, managerial talent is unrelated to firm inputs? The only theory that implies POLS or RE is suitable is if we have a randomized intervention -- still quite rare in the social sciences.

        Last edited by Andrew Musau; 07 Feb 2024, 15:23.

        Comment


        • #5
          Well, it is entirely possible for the within-panel effect of a variable to differ, even differ widely, from its between-panel effect. And if the between-panel effect is the estimand of interest in the research question, a fixed-effects model will just give you a consistent estimate of the wrong thing. Right answer to the wrong question. My point is that you need to know which effect you are interested in and analyze accordingly.

          Now, I do agree that if you are looking to estimate between-panel effects, and if you are not looking at randomized experiment results, then there is the possibility, even the likelihood, that the estimates you get are confounded by unmeasured (or measured, if you don't correctly adjust for them) variables. That is not in question. In modern epidemiology, we do randomized trials when feasible. When they are not, I think we are pretty consistent in not drawing strong conclusions from any non-randomized study. Rather, the field as a whole will amass multiple observational studies, each attempting to cover some of the deficits of the preceding ones, until we reach a point where we have a consistent body of results and where the notion seems far-fetched that something not as yet dealt with as a confounder remains. Reliance on a strong background theory, and the presence of a good biological mechanism to underpin the findings are usually sought as well. Even with all that, some degree of residual uncertainty about the results will still be acknowledged.

          Comment


          • #6
            With randomized controlled trials (RCTs), RE models can be justified, and in this case, RE is consistent. However, even in this scenario, the point to consider is that if one were to run a FE model with RCT data, the RE and FE coefficients would not differ significantly on the time-varying variables. What I object to, as highlighted in the linked post by JW, is the notion that one begins with the goal of investigating between-panel or within-panel effects. From my research tradition, the goal of an investigation can be establishing causal relationships between variables. If, for one reason or another, this is not feasible, then the researcher may aim to depict or describe a phenomenon without seeking to determine cause-and-effect relationships between variables. Therefore, it is notable that in JW's post, he states:

            If you want to control for systematic, unmeasured differences across units (individuals, firms, schools, and so on) then FE is preferred.
            In this manner, FE serves as a method to control for unobserved heterogeneity. With a causal relationship established, one can then proceed to conduct between-unit comparisons. For instance, in a study investigating the impact of education on income levels, based on the causal estimates, the researcher may compare the income disparity between individuals with higher education levels and those with lower education levels within the same region. Finally, if the goal is descriptive research, as JW notes, RE may not be better than pooled OLS:

            If the goal is a descriptive regression, then just use pooled OLS. That's arguably as descriptive as RE or the between regression.

            Comment


            • #7
              What I object to, as highlighted in the linked post by JW, is the notion that one begins with the goal of investigating between-panel or within-panel effects. From my research tradition, the goal of an investigation can be establishing causal relationships between variables. If, for one reason or another, this is not feasible, then the researcher may aim to depict or describe a phenomenon without seeking to determine cause-and-effect relationships between variables. Therefore, it is notable that in JW's post, he states:
              But I think that one must start an investigation with a clear goal of which associations are being investigated. To give a concrete example from epidemiology, consider the associations between age and blood level of HDL ("good") cholesterol. As it turns out, within individual people, the HDL level declines with advancing age. It is also true that people with lower levels of HDL cholesterol experience higher mortality from cardiovascular disease, definitely through midlife and the early senior years, less clear if this continues among the very elderly. The relationship between lower HDL and higher mortality was at one time thought to be causal, although we have since learned otherwise. Now, as it happens, the selective culling of people with lower levels of HDL from the population and the within-person declines of HDL with age just balance each other out, so that in a cross-section of the population one finds that the mean HDL level does not vary with age. So this is a clear case where in a longitudinal study that starts from a representative population sample, we will find that the between-person effect of age on HDL is zero, but the within person effect of age on HDL is (strongly) negative. If you don't know which of these effects you are trying to study, you can't even begin to properly analyze the data from such a study. It is meaningless to ask what "the association" between age and HDL is: there are two distinct associations.

              This is not some odd idiosyncracy of blood lipids. I am no sociologist, but one can easily identify similar situations: the effects on a person of getting married do not necessarily equal the effects of being married on social, behavioral, and economic outcomes. If you are going to study those effects, you must start by being clear which you want to assess. Or if you want to assess both, you need to be clear that they can be different and use appropriate methodologies either to demonstrate that they do not, in the case at hand, differ, or to assess them separately.

              Finally, while epidemiology, too, often strives to study causal relationships, much important and useful knowledge also derives from descriptive studies and from predictive modeling. I do not believe that economics avoids these non-causal research paradigms, though some investigators may not undertake them personally. And doing so would, I believe, in the long run constrict and ultimately strangle the causal research, because some important causal relationships are initially unsuspected until descriptive or predictive research suggested their possibility.

              Comment

              Working...
              X