Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When choosing between random and fixed effects models for logistic regression

    I understand the assumption for using a random effects model is that the unobserved group-level effects are uncorrelated with the explanatory variables. I wonder if these effects refer to such time-invariant heterogeneities as gender and country of origin, the omitting of which can result in biased coefficients.

    I also wonder if some time-invariant explanatory variables have strong effects on the outcome variable, whether using a fixed effects model will produce very imprecise coefficients.

    Which scenario is worse, biased or imprecise? In both cases, the coefficients do not accurately reflect the effect of the predictor variable on the outcome variable. Is it in the first case, the direction of the coefficient is correct, but the magnitude is wrong? And in the second case, is it just a wrong coefficient?

    Thank you.
    Last edited by Meng Yu; 06 Jan 2022, 23:57.

  • #2
    I had the above stated questions due to these two articles: https://journals.plos.org/plosone/ar...l.pone.0110257
    https://economics.mit.edu/files/7564

    Would really appreciate it if someone could share their thoughts.
    Last edited by Meng Yu; 07 Jan 2022, 13:31.

    Comment


    • #3
      One difference exists because the FE estimator removes all between-group variation and only uses within-group variation. This means that group-level variables (that are constant across the entire group) cannot be included in the estimated model, and, for panel data, time-invariant variables cannot be included. Thus, when the objective of an analysis is to measure the effect of group-level variables, FE estimation is not a viable method. RE estimation, on the other hand, capitalizes on both within- and between-group variation, and therefore allows for the inclusion of variables that are constant within a group.
      This quote is from the article https://journals.plos.org/plosone/ar...l.pone.0110257

      My question is: Although my independent variable is not time-invariant, some of my covariates are. If I use FE model, these variables cannot be included. Does that mean their effects on the outcome variable are not measured in the equation? Will that result in a biased coefficient of the independent variable?

      Comment


      • #4
        I wonder if these effects refer to such time-invariant heterogeneities as gender and country of origin, the omitting of which can result in biased coefficients.
        According to the PLOS One article, unobserved group-level characteristics refer to the unobserved actual determinants of the outcome variable.

        Comment


        • #5
          My take is that I'm much more concerned with overfitting than I am about bias. I'd rather have a model be wrong some of the time and inform me about the underlying data generating process as best as possible than a model perfectly explain my data and perform poorly outside of my data.

          Comment


          • #6
            Dear Meng Yu

            Time-invariant variables cannot be included because they are collinear with the fixed effects, which means that they are not needed because the fixed effects account for the effect of all variables that are time-invariant (even the ones for which you do not have data). So, the fact that you cannot include these variables does not cause any problem, unless you are interested in measuring their effect (which is not your case). In non-linear models such as the logit, the RE estimator also needs assumptions about the distribution of the random effects, and therefore its validity is always questionable.

            Best wishes,

            Joao

            Comment


            • #7
              Dear Joao Santos Silva

              Thank you for your reply.
              In non-linear models such as the logit, the RE estimator also needs assumptions about the distribution of the random effects
              Do you mean the time variant variables all need to have a certain distribution? What kind of distribution does it need to be?

              According to the PLOS One article, random effects model seems to have some advantage over fixed effects model. But I am not if it only applies to OLS regression. Below is a quote from the article.
              A second distinction between the two traditional estimators is that RE estimation tends to be more flexible and fits easily into a hierarchical framework. Using this framework, groups are easily nested within one another, variables that affect different levels of a hierarchy are more easily explored, and heterogeneous marginal effects can be explored in a random coefficient context

              Comment


              • #8
                No, it is the unobservable random effects that are assumed to have some pre-specified distribution. Personally, I would avoid the RE logit.

                Comment


                • #9
                  I used fixed effects model with three waves of panel data once and the coefficients seemed to be quite off from previous research. That experience made me a bit suspicious of fixed effects model. Later an economist told me you need to have at least four waves of panel data to use fixed effects model, but that claim was disputed on this forum.

                  Comment


                  • #10
                    I am not ware of any motive that can support that claim. Maybe the problem is with your data, with the previous research, or somewhere else.

                    Comment


                    • #11
                      I wonder anyone else has any comments on the advantage of a random effects model other than it can present the results for time-invariant variables. Thank you.

                      Comment


                      • #12
                        I've had this open for more than a week intending to respond. A few things. First, one needs to be careful with FE logit, too, as it requires an independence assumption across time. It's apparently not well known that serial correlation can cause pretty serious bias (unlike in the linear case, where it causes no bias). A correlated random effects logit can be useful if serial correlation is suspected. It makes it easy to compute average partial (marginal) effects. The drawback, as suggested by Joao, is that it uses a distributional assumption for the heterogeneity. But this has benefits, making APEs identified and easy to estimate.

                        Here's a link to a paper that describes the tradeoff; it's with two of my former students.

                        Kwak_Martin_Wooldridge_2021


                        As we discuss in the paper, it's possible to compute a hybrid APE by using conditional MLE to estimate beta, then estimate the "fixed effects," then compute the APEs from there. Unfortunately, how one does inference unless T is pretty large is not clear. The CRE approach makes inference trivial.

                        Comment


                        • #13
                          Hi Jeff,

                          Thank you very much for your post. I am starting to reading your co-authored article but having some difficulty understanding everything.

                          First, one needs to be careful with FE logit, too, as it requires an independence assumption across time.
                          Just to confirm you are saying FE logit requires an assumption that the dependent variable is independent from each other from one wave to another. I guess that can be one reason why in health research RE is often used as our health status is often related to previous health conditions.

                          A correlated random effects logit can be useful if serial correlation is suspected.
                          Do you mean RE logit, as in Stata the command "xtlogit" as it is default random effects? Your article tests the robustness of Conditional Logit, what exactly is it?
                          Last edited by Meng Yu; 19 Jan 2022, 23:03.

                          Comment


                          • #14
                            Hi Jeff,
                            In your article,
                            We also find that the APEs estimated with the hybrid CL approach are significantly biased when T is small,
                            here does "T is small" refer to the number of waves of data? If so, how small is small? Like 3 waves or fewer?

                            Comment


                            • #15
                              Meng: Simulations are always special so one cannot give hard rules. If T <= 3 there is clearly an issue. But I can imagine FE logit works well with T >= 10. There's a lot of gray area. There's also the problem of obtaining average marginal effects using FE logit. Unless T is pretty large, inference methods aren't known. And bootstrapping is not known to work in these cases with incidental parameters, either.

                              I wouldn't use RE logit. I would use pooled logit and include the time averages of the time-varying explanatory variables. Then use vce(cluster id).

                              Comment

                              Working...
                              X