Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to interpret changes in odds/standard deviation interpretation for multinomial regressions?

    Hello everyone,

    I am using a multinomial logit regression with three categories. I want to calculate the economic significance of independent variables. Many papers refer to changes in odds in interpreting the economic significance of variables in multinomial regressions. According to these papers, the change in odds is the percentage change in the odds ratio for one standard deviation increase in explanatory variables. My questions are as follows.

    Q1: Is the change in odds for multinomial regressions the same as the standard deviation interpretation for linear regressions?

    Q2: I have difficulties in interpreting STATA outputs. For example, I have a dependent variable of three categories, 0, 1, and 2. The base case of the multinomial logit regression is category 0. That is, I have two regressions, 1 versus 0 and 2 versus 0. I want to know the change in odds of CR_year. I use the following codes to calculate the change in odds. My understanding is that e^bStdX gives the results of the change in odds. However, I am not sure which numbers to use. Are 0.9072 and 0.4431 the changes of odds for regressions 1 versus0 and 2 versus 0 respectively?

    This may be a silly question...but I would appreciate your kind help.

    Code:
    . mlogit Security CR_year, base(0) ro
    
    Iteration 0:   log pseudolikelihood = -3897.9265  
    Iteration 1:   log pseudolikelihood = -3692.8119  
    Iteration 2:   log pseudolikelihood = -3688.6424  
    Iteration 3:   log pseudolikelihood = -3688.6368  
    Iteration 4:   log pseudolikelihood = -3688.6368  
    
    Multinomial logistic regression                 Number of obs     =      3,731
                                                    Wald chi2(2)      =     338.51
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -3688.6368               Pseudo R2         =     0.0537
    
    ------------------------------------------------------------------------------
                 |               Robust
        Security |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    0            |  (base outcome)
    -------------+----------------------------------------------------------------
    1            |
         CR_year |  -.2081153   .0907458    -2.29   0.022    -.3859737   -.0302568
           _cons |  -.7551389   .0584756   -12.91   0.000    -.8697489   -.6405288
    -------------+----------------------------------------------------------------
    2            |
         CR_year |  -1.738738   .0956928   -18.17   0.000    -1.926292   -1.551184
           _cons |      .2518   .0440799     5.71   0.000     .1654051     .338195
    ------------------------------------------------------------------------------
    . listcoef, help
    
    mlogit (N=3731): Factor Change in the Odds of Security 
    
    Variable: CR_year (sd=.46807845)
    
    Odds comparing    |
    Alternative 1     |
    to Alternative 2  |      b         z     P>|z|     e^b   e^bStdX
    ------------------+---------------------------------------------
    1       -2        |   1.53062   13.666   0.000   4.6211   2.0472
    1       -0        |  -0.20812   -2.293   0.022   0.8121   0.9072
    2       -1        |  -1.53062  -13.666   0.000   0.2164   0.4885
    2       -0        |  -1.73874  -18.170   0.000   0.1757   0.4431
    0       -1        |   0.20812    2.293   0.022   1.2314   1.1023
    0       -2        |   1.73874   18.170   0.000   5.6902   2.2566
    ----------------------------------------------------------------
           b = raw coefficient
           z = z-score for test of b=0
       P>|z| = p-value for z-test
         e^b = exp(b) = factor change in odds for unit increase in X
     e^bStdX = exp(b*SD of X) = change in odds for SD increase in X


  • #2
    The interpretation of multinomial logistic regression output is extremely complicated. So complicated that I think it is a bad idea to even attempt it.

    The problem comes from the fact that the total probability across all categories must add up to 1. So you can end up with weird situations where the odds ratios for both categories 2 and 3 are > 1 for some predictor, but an increase in that predictor is actually associated with a decreased probability for one of those outcomes! That can arise, for example, when the increase in probability for category 2 is large enough to "crowd out" category 3. It is almost impossible to develop a good intuition for what the multinomial model is telling you by studying and manipulating the odds ratios. In fact, it is, in principle, impossible, because the "crowding out" effect also depends on the base probability rates.

    I strongly recommend, instead, looking at the predicted probabilities you can get from -margins-. It appears from your code that your predictor variable, CR_year is a continuous variable. So select interesting values of CR_year and have -margins- calculated the predicted probabilities of all three levels of the Security outcome at those values. For example, if interesting values of CR_year are 2010, 2015, and 2020, you could run

    Code:
    margins, at(CR_year = (2010 2015 2020))
    
    //  AND TO SEE THE RELATIONSHIPS GRAPHICALLY
    marginsplot
    I suggest you ignore the odds ratios. They are, at best, difficult to understand, and if not perfectly understood they can be seriously misleading.

    I want to know the change in odds of CR_year.
    I don't know what this means. CR_year is your predictor variable. Your commands model the odds of Security, not CR_year.

    Comment


    • #3
      Thanks a lot for your reply and suggestions. I know multinomial regression is very complicated. But I notice many papers use changes in odds in multinomial logit interpretation.
      For example, paper 1 says:
      Change in odds indicates the percentage change in the odds ratio for one standard deviation increase of continuous independent variables, or an increase from zero to one for dichotomous independent variables.
      And paper 2 says:
      To compute the economic effect of a variable on pure equity issuance, for example, we add one standard deviation of this variable to its actual values but keep the actual values of other variables and compute the predicted average likelihood of pure equity issuance using the coefficients. We also subtract its actual values by one standard deviation and compute the predicted average likelihood. The change in the predicted average likelihood is the economic effect.
      In the above two papers, the direction of changes in odds is the same as that of the corresponding variable coefficients. That is, a positive (negative) coefficient has a positive (negative) change in odds. However, in my sample, -listcoef- shows that the directions of coefficients and changes in odds are inconsistent. I want to figure out how they do that in Stata and replicate the method. I am also wondering, according to paper 2, can I -tabstat- standard deviations of independent variables and then multiple them with regression coefficients.

      My key variables of interest, CR_year, is a dummy variable. The dependent variable is Security with three categories, 0, 1, and 2. I have many control variables in my regressions, and I omit them in this case for brevity. By saying "I want to know the change in odds of CR_year", I mean to know the percentage changes in the odds ratio in regression of Security 1 versus 0 and Security 2 versus 0 for a one standard deviation increase of CR_year.

      I run -margins- for CR_year. Please find Stata outputs below. I have difficulties in interpreting the results of -margins-. According to the -mlogit-results, I have two regressions, 1 versus 0 (R1) and 2 versus 0 (R2). However, -margins- gives predicted probabilities of three levels of dependent variables. How can I know the margin effects of CR_year in R1 and R2 because I am interested in the impact of CR_year on the choice between 1 over 0 and 2 over 0?

      Code:
      . mlogit Security CR_year, base(0) ro
      
      Iteration 0:   log pseudolikelihood = -3897.9265  
      Iteration 1:   log pseudolikelihood = -3692.8119  
      Iteration 2:   log pseudolikelihood = -3688.6424  
      Iteration 3:   log pseudolikelihood = -3688.6368  
      Iteration 4:   log pseudolikelihood = -3688.6368  
      
      Multinomial logistic regression                 Number of obs     =      3,731
                                                      Wald chi2(2)      =     338.51
                                                      Prob > chi2       =     0.0000
      Log pseudolikelihood = -3688.6368               Pseudo R2         =     0.0537
      
      ------------------------------------------------------------------------------
                   |               Robust
          Security |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      0            |  (base outcome)
      -------------+----------------------------------------------------------------
      1            |
           CR_year |  -.2081153   .0907458    -2.29   0.022    -.3859737   -.0302568
             _cons |  -.7551389   .0584756   -12.91   0.000    -.8697489   -.6405288
      -------------+----------------------------------------------------------------
      2            |
           CR_year |  -1.738738   .0956928   -18.17   0.000    -1.926292   -1.551184
             _cons |      .2518   .0440799     5.71   0.000     .1654051     .338195
      ------------------------------------------------------------------------------
      
      . margins, eyex(CR_year)
      
      Average marginal effects                        Number of obs     =      3,731
      Model VCE    : Robust
      
      ey/ex w.r.t. : CR_year
      1._predict   : Pr(Security==0), predict(pr outcome(0))
      2._predict   : Pr(Security==1), predict(pr outcome(1))
      3._predict   : Pr(Security==2), predict(pr outcome(2))
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      ey/ex   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      CR_year      |
          _predict |
                1  |   .0952331   .0079988    11.91   0.000     .0795556    .1109105
                2  |    .027795   .0219712     1.27   0.206    -.0152678    .0708578
                3  |  -.4681907   .0306006   -15.30   0.000    -.5281668   -.4082147
      ------------------------------------------------------------------------------


      Originally posted by Clyde Schechter View Post
      The interpretation of multinomial logistic regression output is extremely complicated. So complicated that I think it is a bad idea to even attempt it.

      The problem comes from the fact that the total probability across all categories must add up to 1. So you can end up with weird situations where the odds ratios for both categories 2 and 3 are > 1 for some predictor, but an increase in that predictor is actually associated with a decreased probability for one of those outcomes! That can arise, for example, when the increase in probability for category 2 is large enough to "crowd out" category 3. It is almost impossible to develop a good intuition for what the multinomial model is telling you by studying and manipulating the odds ratios. In fact, it is, in principle, impossible, because the "crowding out" effect also depends on the base probability rates.

      I strongly recommend, instead, looking at the predicted probabilities you can get from -margins-. It appears from your code that your predictor variable, CR_year is a continuous variable. So select interesting values of CR_year and have -margins- calculated the predicted probabilities of all three levels of the Security outcome at those values. For example, if interesting values of CR_year are 2010, 2015, and 2020, you could run

      Code:
      margins, at(CR_year = (2010 2015 2020))
      
      // AND TO SEE THE RELATIONSHIPS GRAPHICALLY
      marginsplot
      I suggest you ignore the odds ratios. They are, at best, difficult to understand, and if not perfectly understood they can be seriously misleading.


      I don't know what this means. CR_year is your predictor variable. Your commands model the odds of Security, not CR_year.

      Comment


      • #4
        A lot to unpack here.

        First, the quote you show from paper 2 is confusing. The word likelihood, used in its technical meaning, makes no sense at all in this context. So they are using the term loosely in some way. It is more common for people to (mis)use the term likelihood as a synonym for probability than odds. So I don't know what they did there.

        I can't comment on -listcoef-. It is not a part of official Stata and I am not familiar with it. I don't know what it does.

        I could give you a long rant on why I dislike assessing 1 standard deviation changes, but since we already are juggling many balls, I'll keep this one short and focused on this particular problem: having clarified the CR_year is a dichotomous variable, it makes no sense at all to talk about a 1 sd change in it. A 1 sd change in a dichotomous variable is simply impossible in reality. Why waste time looking at the effect of a change that can never actually happen? I could give you other reasons not do this, but I'll cut it short here.

        By saying "I want to know the change in odds of CR_year", I mean to know the percentage changes in the odds ratio in regression of Security 1 versus 0 and Security 2 versus 0 for a one standard deviation increase of CR_year.
        I'm hoping you are just being loose with your use of language here. While it is certainly possible to calculate the percentage changes in the odds, the odds ratios do not change--they are constants of the model. Percentage changes in the odds can be calculated, but I don't know how anybody would interpret them. It's a complication on top of a complication. If I were a journal reviewer and I saw those presented in a paper I'd insist that something comprehensible be presented instead.

        I run -margins- for CR_year. Please find Stata outputs below. I have difficulties in interpreting the results of -margins-. According to the -mlogit-results, I have two regressions, 1 versus 0 (R1) and 2 versus 0 (R2). However, -margins- gives predicted probabilities of three levels of dependent variables. How can I know the margin effects of CR_year in R1 and R2 because I am interested in the impact of CR_year on the choice between 1 over 0 and 2 over 0?
        The way you used margins, you calculated the average elasticity, not marginal effect, of each outcome with respect to CR_Year. But actually, since CR_Year is a dichotomous 0/1 variable, you did it wrong. You need to re-do the regression itself putting i. in front of CR_Year so that Stata will know it is dichotomous and calculate things with it appropriately. (See -help fvvarlist- for more information about use of factor-variable notation in Stata.) The subsequent command to get average marginal effects of CR_Year would be -margins, dydx(CR_Year)-. This will give you the marginal effects on the probabilities of the three outcomes. This is what I would recommend you go with. It is straightforward to understand: when CR_Year goes from 0 to 1, the associated change in probability of each outcome is what is shown. If you want a relative change in the probability, that would be the semi-elasticity and you can get that with -margins, eydx(CR_Year)-. If you really want to confuse your audience and do the relative change in the odds rather than the probabilities, you can get that, too, with some effort:

        Code:
        mlogit Security i.CR_Year, vce(robust)  // PERHAPS OTHER VARIABLES AS WELL
        
        //  AVERAGE RELATIVE CHANGES (SEMI-ELASTICITY) OF OUTCOME PROBABILITIES
        //  ASSOCIATED WITH CHANGE OF CR_Year BETWEEN 0 AND 1
        //  (NOT RECOMMENDED)
        forvalues o = 1/3 {
            margins, eydx(CR_Year) expression(predict(outcome(`o'))/(1-predict(outcome(`o'))))
        }
        
        //  AVERAGE ABSOLUTE CHANGES IN OUTCOME PROBABILITIES ASSOCIATED WITH
        //  CHANGE OF CR_Year BETWEEN 0 AND 1 (RECOMMENDED)
        margins, dydx(CR_Year)

        Comment


        • #5
          Many thanks for your clarification. Yes, I agree with you that marginal effects are more straightforward. Thanks again for your kind help.

          Comment

          Working...
          X