Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • On binary choice panel data estimations: fixed-effects versus random-effects

    Hi everyone,

    I'm researching the choice between a 15-year mortgage and a 30-year mortgage and I have an unbalanced panel of 994 households over 8 periods (16 years because it's biannual data). The response variable is then 1 if a 15-year mortgage is selected and 0 if a 30-year mortgage is selected. I'm using logit because it allows for fixed-effects estimations, since probit's fixed-effects are biased.

    The issue I'm presented with is that because of its nature the fixed-effects estimation drops 531 households because they don't switch the type of mortgage they have, i.e. for those households their response variable is either all 0s or all 1s. This is more than 50% of the households in the data set. Thanks to Richard Williams I found the following link http://www.stata.com/statalist/archi.../msg00669.html where David Druker does a Hausman specification test.

    I want to ask you guys if you think this is appropriate, because I have a very strong reservation. The estimation sample for both models is different and, more importantly, the exclusion rule is not random, so the sample in the fixed-effects estimation is by construction totally different than the sample in the random-effects estimation (the whole sample). Notice that since the fixed-effects estimation in my case removes all those observations where the mortgage did not change, it effectively reduces the sample to those households that switched from a 15-year mortgage to a 30-year mortgage in the observed period. We should notice then that the difference in the coefficients from the fixed-effects and the random-effects estimations are not necessarily due to unobserved heterogeneity that is correlated with the regressors, but rather because the effect of income, for example, on deciding on a 15-year mortgage when you first ask for the mortgage may not be the same as when you decide to re-finance and switch mortgages!!!

    I believe this can be extended to any kind of choice you're making. Please guys share your thoughts because I may be missing something here.
    Alfonso Sanchez-Penalver

  • #2
    There is not much you can do about the reduction in sample size due to fixed effects estimation, and therefore if the reduced sample is observationally different from the full sample, you may want to compare that fixed effects model with a random effects model of the same reduced sample.

    With regards to whether a fixed effects- or random effects-logit model is appropriate in your context, I will leave others who are familiar with the subject area to comment.

    Comment


    • #3
      Hi Andrew,

      thank you for your comments. I agree there's nothing I can do about the reduction of the sample size in the fixed-effects estimation and, as you suggest, I have already estimated a random-effects model with the reduced sample to compare the coefficients, and perform what I believe is a more appropriate Hausman test. The question that remains unanswered, however, is whether the Hausman test that compares the random-effects coefficients obtained using a whole sample, with the fixed-effects coefficients obtained from a reduced sample where exclusion is non-random, is still a valid specification test. I believe this is an important question independent of what nature of the data or the study is.

      Thanks again!
      Alfonso Sanchez-Penalver

      Comment


      • #4
        Dear Alfonso,

        I think there is some confusion here; the FE estimator uses the full sample. What happens is that there are observations whose contribution to the log-likelihood function is zero for any value of the parameters. Because these observations do not contain information about the parameters being estimated, Stata excludes them from the estimation. These observations contribute to the likelihood function of the RE estimator. So, dropping them when estimating with RE is a bad idea.

        Hence, the FE results should the compared with the RE results obtained with the full sample, not with the results obtained with the smaller one.

        All the best,

        Joao

        Comment


        • #5
          In reply to #3, why I would not divorce the estimation method from your data is that there are cases where you have very little variation in your outcome (resulting in a significant number of variables being dropped when running FE). In such cases, what you question is whether FE is a valid estimation method... and subsequent comparisons with RE make no sense.

          Comment


          • #6
            Hello Alfonso,

            Another alternative is to think about a correlated random-effects model to estimate your model. Below is a link of a good discussion about how to interpret and implement these models using Stata.

            http://www.iza.org/conference_files/...nonlin_iza.pdf

            Comment


            • #7
              Hi Joao, Andrew, and Enrique,

              while reading Joao's point (a very valid one) I was thinking in the same direction as Andrew posted. My thinking is in the following lines. In panel data we have two sources of variation, within the group and between the groups. For the response variable in my sample there is only within variation for less than 50% of the sample (which theoretically would imply that there is less than 50% of variation in the response variable). The question then is whether FE is a valid method of estimation in that case, something that I haven't found addressed in the literature btw. If we agree that FE is not a valid estimation method, then the Hausman test has no reasoning and shouldn't be done. Considering the case in hand, notice that the fixed effects method would estimate the coefficients on the choice of switching mortgages (re-financing for those who changed mortgage type). The question then would be whether the independent variables would affect the choice between mortgage types the same way when buying the house, or when the families re-finance their mortgage. Not necessarily clear one way or another. It's troubling, to say the least, that you lose more than 1/2 the sample with FE.

              Enrique, thanks for the link. Something new to learn!!!! Will definitely check it out.

              Thank you all for pitching in!!!
              Last edited by Alfonso Sánchez-Peñalver; 06 Oct 2015, 16:13.
              Alfonso Sanchez-Penalver

              Comment


              • #8
                Originally posted by Enrique Pinzon (StataCorp) View Post
                Hello Alfonso,

                Another alternative is to think about a correlated random-effects model to estimate your model. Below is a link of a good discussion about how to interpret and implement these models using Stata.

                http://www.iza.org/conference_files/...nonlin_iza.pdf

                Thanks Enrique! I was about to suggest the same thing. Let me emphasize, Alfonso, that the CRE approach has some real advantages over logit FE and RE. For one, these both require independent shocks over time -- almost always an unrealistic assumption. For another, average marginal effects cannot be gotten for FE logit and are hard to get for RE logit. The CRE approach delivers AMEs very easily, and with proper standard errors.

                Comment


                • #9
                  Hi Jeff,

                  I just went over the presentation that Enrique mentioned. This is similar to the your 2008 paper (which you mention in the presentation) using xtgee there because it was fractional data, and here we include a random intercept. I remember applying that methodology to a study on school performance based on MCAS some time back (never got published and my co-authors dropped it). Also with fractional data if I remember reading the paper right then, you required balanced panels, but here you don't. A couple of questions:

                  1. From this first quick reading, my understanding is that the marginal effects on the averaged variables are the additional between effects that would have been left in the residual if those averaged variables would have not been included, and in that case would have been unobserved heterogeneity correlated with the regressors that would call for a "fixed-effects" model. Is this the right interpretation?

                  2. How do we compute the partials? From the reading I'm understanding that the following would work after xtprobit, re:
                  Code:
                  * Getting the variance of the random intercept
                  scalar sig2 = (exp(_b[lnsig2u:_cons]))^2
                  * Calculating the margins of the adjusted prediction
                  margins, dydx(*) expression(normal(predict(xb)/sqrt(1 + sig2)))
                  3. Once I have the partials, is the estimated overall partial of the variable the sum of the partials on the actual variable and the averaged one (using lincom to get the standard errors for example)? Or does leaving them separate make more sense?

                  Thanks for the help!
                  Alfonso Sanchez-Penalver

                  Comment

                  Working...
                  X