Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordered Probit vs Random Effect Ordered Probit for Panel Data

    Dear all,

    I am currently working with a dataset involving firms credit ratings (dependent variable) and a list of firm characterisitcs (independent variables). The dataset contains yearly data for several companies covering a 20 years period. Since credit rating is an ordinal variable I would like to run an ordered probit regression as it was done in many studies in the past. What I was wondering is if due to the panel nature of my data I must use xtoprobit or if I can also use a normal oprobit. It seems to me that in previous studies the normal ordered probit was used rather than the random effect ordered probit and therefore I would like to hear your opinion on what would be better to use. Just to be clearer my data contains ID Company, Year, numerical credit rating, VAR1, VAR2, etc

    I thank you very much in advance for your precious help and support.

  • #2
    Did previous studies have longitudinal data?

    The xtoprobit and meoprobit routines are fairly new to Stata -- I think they were introduced in Stata 13. The fact that such models may not have been used much in the past may just reflect the fact that there was no software for it.

    The current Stata Journal also introduces a new command, feologit.

    https://journals.sagepub.com/doi/abs...urnalCode=stja

    I plan to check it out sometime, since I thought fixed-effects logit was not possible.

    Anyway, I would think you would want to use xtoprobit. If you ignore the panel design your standard errors will be off, and there may be other problems too.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 18.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Dear Professor Williams,

      I thank you for your fast reply. Previous studies such as Blume et al. (1998), Amato & Furfine (2004) and Gray et al. (2006) seem to use longitudinal data if I understood it correctly. To me longitudinal data means having data for multiple companies over several years. I believe that this is the typical setting for investigating credit ratings. I imagined that using a normal oprobit could have issues with standard errors. I was wondering if I could correct those issues by using a more robust SE (e.g. clustering at the firm level). As you say the fact that in the past the random effect ordered probit model was not used much is proabably due to a lack of software. Does it mean that it is appropriate to use the oprobit though? Are there any other issues which could occur beside biased standard errors?

      Comment


      • #4
        Andrea: You can use any pooled estimation method (oprobit) in place of the joint MLE (xtoprobit). In fact, in most cases, the oprobit estimator will be more robust because it does not restrict the serial dependence across time. xtoprobit assumes that all serial correlation is due to the so-called random effect. There is some evidence out there that this assumption can be important, but I don't know any directly for ordered outcomes.

        To me, the more serious issue is how to handle correlation between the heterogeneity and the covariates. In Chapter 16 of my MIT Press book, I recommend extending the Chamberlain/Mundlak device to this case. If the panel is balanced, this easy, but I've worked on the unbalanced case, too. You generate the time averages of the time-varying covariates separately for each cross-sectional unit and include these as covariates. This tries to approximate a "fixed effects" solution without estimating many incidental parameters.

        Generic code looks like:

        Code:
        egen x1b = mean(x1), by(id)
        egen x2b = mean(x2), by(id)
        ...
        egen xKb = mean(xK), by(id)
        oprobit y x1 x2 ... xK x1b ... xKb z1 ... zJ i.year, vce(cluster id)
        I would recommend computing average partial effects, but the above gives consistent estimators of the scaled coefficients and proper standard errors. You can also add the time averages to xtoprobit in a similar way. In the linear case, the estimates would be identical.

        JW


        Comment


        • #5
          Jeff: If you don't mind me asking, what would be the difference between the Chamberlain/Mundlak adjustment for oprobit vs xtoprobit? When applied to random effects like xtoprobit, it is known as the correlated random effects model, correct? I assume you covered this in your textbook. I likely glossed over it.

          Comment


          • #6
            Chris: It's the same as with probit versus xtprobit or tobit verus xttobit. One can always use the C-M device to account for correlation between heterogeneity and covariates. To me, it's necessary to at least try this. A separate issue is how one wants to estimate the parameters. All of the pooled methods (with small T, large N) are agnostic about the source of serial correlation. Clustering gets the job done after (simple) pooled estimation. I like the simplicity and the robustness. But one can add on the serial independence assumption on the idiosyncratic errors and use joint MLE -- xtprobit, xtoprobit, xttobit, among others.

            Unfortunately, my book hasn't been revised in 10 years. When I teach this material now, I emphasize the distinction much more. I do have a general discussion in Chapter 13 on pooled versus joint MLE for panel data.

            I have some work with a recent student of mine, Alyssa Carlson at Missouri, Ying Zhu at UCSD, where we find, via simulation, the inconsistency in the joint MLE in the presence of serial correlation is perhaps not so bad. My intuition is that there would be no difference in the two methods in the linear case: Mundlak POLS and Mundlak RE are numerically identical. But one's intuition sometimes does not extend to nonlinear models ....

            Comment


            • #7
              Thank you for the explanation and sources, Jeff. I'll revisit chapter 13 of your book and look for your recent work.

              Comment


              • #8
                Dear Professor Wooldridge,

                I thank you very much for your detailed and helpful reply. I am truly grateful.

                I apologize if this might seem like a trivial question but what do the terms z1, z2.. zJ stand for in the oprobit?

                Comment


                • #9
                  I forgot to mention that you can and, in many cases, should add time-constant variables. That’s what the z indicate.

                  Comment


                  • #10
                    Thank you very much for the valuable information. I hope you don't mind me asking a further question. How should I interpret the ordered probit coefficients of the new covariates (time averages of the time varying covariates)?

                    A slighlty unrelated question but of great importance to me would be the following. In a simple ordered probit is it possible to include industry dummies (in the form of i.Industry) to control for industry fixed effects? I ask this because I have read in various sources that one should not attempt to do such a thing using probit models. However, in many academic papers, authors use this strategy. So I was wondering if it would be legitimate for me to the same. The predictive power of the model substantially rise by accounting for industry effects and the all the signs of the coefficients become economically meaningful.

                    I apologize for these further questions but I hope you yould shed some light on the issue

                    Comment


                    • #11
                      Dear Jeff Wooldridge !
                      I would like to ask you if the following regression is also valid in the case of unbalanced panel:
                      Code:
                       
                       egen x1b = mean(x1), by(id) egen x2b = mean(x2), by(id) ... egen xKb = mean(xK), by(id) oprobit y x1 x2 ... xK x1b ... xKb z1 ... zJ i.year, vce(cluster id)
                      I read your paper "Correlated random effects models with unbalanced panels" and the corresponding slides "Correlated Random Effects Panel Data Models
                      IZA Summer School in Labor Economics
                      May 13-19, 2013"
                      where you used the following example:

                      Code:
                      glm math4 lavgrexp lunch lenrol y95 y96 y97 y98 lavgrexpb lunchb lenrolb
                      y95b y96b y97b y98b tobs3 tobs4, fam(bin) link(probit) cluster(schid)
                      You included the variables, their means and 2 out of the 3 variables indicating the number of years per panel.

                      So I don't really see a difference, is that right?

                      Thank you in advance!

                      Comment


                      • #12
                        Dear Jeff Wooldridge, I should rephrase my question in #11, considering unbalanced panels, is it possible to just include year dummies instead of the variables indicating the number of years per panel (except one)?

                        Comment

                        Working...
                        X