Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting coefficients in logit

    Dear Statalist,

    I'm running a logit model and one of my coefficients is -1.0954 and statistically significant at 1% level. In one of the papers I read, they interpreted coefficients in logit as "percentage points", so in my case that would mean that a company is 109.5 percentage points less likely to be affected, which doesn't really make sense. Will be grateful for your insights.

    Thank you!



  • #2
    I don't know what you read, but either it is quite wrong or your misunderstood it.

    -logit- reports logistic regression coefficients, which are in the log odds metric, not percentage points. The log odds metric doesn't come naturally to most people, so when interpreting a logistic regression, one often exponentiates the coefficients, to turn them into odds ratios. To two decimal places, exp(-1.0954) == 0.33. So one way to interpret the results is that a unit increase in whatever variable (let's call it x) this is the coefficient of is associated with decreased odds of whatever the positive outcome in your model is (let's call it y) by a factor of 0.33 (or roughly 1/3).

    Another way that people can interpret logistic regression output is to look at marginal effects, from the -margins- command. So, if x is, say, a 0/1 indicator ("dummy") variable, and if your logistic regression command used factor variable notation (i.e. it looks like logit y i.x, perhaps with some other variables) then you could run -margins, dydx(x)- and the result would be the average difference in the probability of y between x = 0 and x = 1. That would be in percentage points. But that cannot be calculated just from knowing the regression coefficient because it depends on the full data set of values of x and y -- so to get this number you would have to rerun your -logit- and then do -margins-. There are a number of other kinds of marginal effects that can be estimated, depending on your model. These would also be denominated in percentage points.

    Comment


    • #3
      Jane, welcome to Statalist. I would recommend reading some introductory texts on logistic regression. If you want something that is free and online, you might check the "basics of logistic regression handouts at

      https://www3.nd.edu/~rwilliam/stats3/index.html

      Clyde does give an excellent summary but if you need it those handouts give more of a blow by blow explanation.
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment


      • #4
        Some of us prefer logit and probabilities to odds ratios (the default in logistic). You should also look at the margins command which is extremely helpful in interpreting results (particularly in non-linear models).

        Comment


        • #5
          Thank you all for your replies, they have been really helpful! I also already went through some of the handouts and planning to finish them all to get a better hold of it

          Although now I understand that having a coeficient of -1.0954 in logit is fine, I'm still quite unsure about how to interpret it when this coefficient is from the main interaction term (I'm doing DiD in logit). I've found some interpretation of interaction terms in logit on the UCLA website (https://stats.idre.ucla.edu/stata/se...tic-regression), however, when running those commands in Stata, I got that the interaction can't be estimated:


          Code:
          margins, over (post influenced)  atmeans expression(exp(xb())) noatlegend
          margins, over (post influenced)  atmeans
          margins influenced, dydx( post) atmeans post
          margins post#influenced, atmeans

          Code:
          Adjusted predictions                              Number of obs   =      18818
          Model VCE    : Robust
          
          Expression   : Pr(limited), predict()
          over         : post influenced
          
          -------------------------------------------------------------------------------
                        |            Delta-method
                        |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          post#influenced |
                   0 0  |          .  (not estimable)
                   0 1  |          .  (not estimable)
                   1 0  |          .  (not estimable)
                   1 1  |          .  (not estimable)
          -------------------------------------------------------------------------------
          Maybe I can add that all my variables in the model are categorical and I'm using factor variables' notations (ib2.X1 io6.X1 etc.) and clustering.

          Would be really grateful if you have any idea why the interaction can't be estimated.

          Jane

          Comment


          • #6
            The results you are seeking from
            Code:
            margins, over (post influenced) atmeans expression(exp(xb())) noatlegend
            margins, over (post influenced) atmeans
            margins post#influenced, atmeans
            are truly not estimable if you are using panel data with fixed effects estimation (i.e. -xtlogit, fe- or -clogit-). That is, these results are not identifiable in such a model because of colinearity of post and influenced with the time and panel fixed effects, respectively. If you are not using a fixed-effects estimator, then it is not at all apparent to me why you should be getting those results. It would be necessary to see the -logit- output itself and example data to try to figure that out.

            However, you normally can get results for:
            Code:
            margins influenced, dydx( post) atmeans post noestimcheck
            The noestimcheck option must be specified here. If this does not work for you, again, post back showing the -xtlogit- or -logit- output itself along with example data.

            Note: If you specify the -noestimcheck- option for the first three -margins- commands mentioned at the top of this post, Stata will spit out some numbers at you. But those results should not be used: those margins are not identifiable in that model. But the results for -margins, dydx(post)- are OK as that parameter is identifiable.

            Comment


            • #7
              Dear Clyde,

              Thank you for your answer and I'm sorry for my late reply, I've had to put this aside for a while, but getting back to it again.

              You are right that I'm running a logit regression with fixed effects, however, my data is cross-sectional not a panel.

              I've tried to run the command you recommended to interpret the interaction term of DiD logit model, and here's what I got:

              Code:
              margins influenced, dydx( post) atmeans post noestimcheck
              
              Conditional marginal effects                      Number of obs   =      17595
              Model VCE    : Robust
              
              Expression   : Pr(_rationed), predict()
              dy/dx w.r.t. : 1.post
              at           : 0.post          =    .5235977 (mean)
                             1.post          =    .4764023 (mean)
                             0.influenced~o    =    .7503734 (mean)
                             1.influenced~o    =    .2496266 (mean)
                             1.size          =    .2517772 (mean)
                             2.size          =    .2122774 (mean)
                             3.size          =    .1818343 (mean)
                             4.size          =    .3541111 (mean)
                             autonomous      =    .8342637 (mean)
                             2.turnover      =    .1795122 (mean)
                             3.turnover      =    .1979447 (mean)
                             4.turnover      =    .2796653 (mean)
                             5.turnover      =    .1492459 (mean)
                             6.turnover      =    .0949061 (mean)
                             7.turnover      =     .086376 (mean)
                             9.turnover      =    .0123497 (mean)
                             1.age           =     .871625 (mean)
                             2.age           =    .0907379 (mean)
                             3.age           =    .0262876 (mean)
                             4.age           =    .0102363 (mean)
                             9.age           =    .0011131 (mean)
                             family          =    .7036842 (mean)
                             export25        =    .2799212 (mean)
                             1.profitch~e    =    .3526121 (mean)
                             2.profitch~e    =    .3184859 (mean)
                             3.profitch~e    =    .3118189 (mean)
                             9.profitch~e    =    .0170832 (mean)
                             1.employee~e    =    .3294764 (mean)
                             2.employee~e    =    .5219856 (mean)
                             3.employee~e    =     .148538 (mean)
                             1.debtchange    =    .2553983 (mean)
                             2.debtchange    =    .4520023 (mean)
                             3.debtchange    =    .2535825 (mean)
                             7.debtchange    =    .0156973 (mean)
                             9.debtchange    =    .0233196 (mean)
                             1.outlookb~r    =    .3991901 (mean)
                             2.outlookb~r    =    .4095124 (mean)
                             3.outlookb~r    =    .1762917 (mean)
                             9.outlookb~r    =    .0150059 (mean)
                             1.capitalb~r    =    .3150785 (mean)
                             2.capitalb~r    =    .5619572 (mean)
                             3.capitalb~r    =    .1141789 (mean)
                             9.capitalb~r    =    .0087854 (mean)
                             1.credithi~r    =    .3399481 (mean)
                             2.credithi~r    =    .5378428 (mean)
                             3.credithi~r    =    .1063219 (mean)
                             7.credithi~r    =    .0087859 (mean)
                             9.credithi~r    =    .0071013 (mean)
                             1.wave         =    .1801591 (mean)
                             2.wave         =    .1669349 (mean)
                             3.wave         =    .1765038 (mean)
                             5.wave         =    .1796613 (mean)
                             6.wave         =    .1524296 (mean)
                             7.wave         =    .1443114 (mean)
                             1.c0            =    .0244249 (mean)
                             2.c0            =    .0238508 (mean)
                             3.c0            =    .2357281 (mean)
                             4.c0            =    .1581079 (mean)
                             5.c0            =    .0080094 (mean)
                             6.c0            =    .2355359 (mean)
                             7.c0            =    .0163491 (mean)
                             8.c0            =    .0072683 (mean)
                             9.c0            =    .2303259 (mean)
                             10.c0           =    .0261484 (mean)
                             11.c0           =    .0342513 (mean)
                             1.industry      =    .1308807 (mean)
                             2.industry      =    .0653122 (mean)
                             3.industry      =     .169847 (mean)
                             4.industry      =     .279849 (mean)
                             9.industry      =    .3541111 (mean)
              
              ------------------------------------------------------------------------------
                           |            Delta-method
                           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              1.post       |
               influenced |
                        0  |  -.0069799   .0023255    -3.00   0.003    -.0115378   -.0024219
                        1  |   .0081624   .0006487    12.58   0.000     .0068909    .0094339
              ------------------------------------------------------------------------------
              Note: dy/dx for factor levels is the discrete change from the base level.
              1. If I understand it correctly, the companies in the treatment group are 0.8 percentage points more likely to be rationed after the event?

              2. Is it correct to use option "atmeans" when my variables are categorical? If not, what's the right way to do it? (Btw. in my model I'm intentionally omitting one or two categories of some variables, but they seem to be included when calculating margins, should I take care of it somehow?)

              Thank you for your advice!

              Jane

              Comment


              • #8
                1. If I understand it correctly, the companies in the treatment group are 0.8 percentage points more likely to be rationed after the event?
                Yes, conditional on everything else being at the mean values in your sample.

                2. Is it correct to use option "atmeans" when my variables are categorical? If not, what's the right way to do it?
                It is OK provided they were coded with factor variable notation in your original model (which, from your -margins- output, it looks like they were) and provided you interpret them correctly. So, looking, for example, at the mean values for the industry indicators 1.industry through 9.industry, the interpretation is that the marginal effect is being calculated for a population of firms in which the proportions of industries 1, 2, 3, 4, and 9 are, respectively, to whole percentages) 13%, 7%, 17%, 28%, and 35%. No particular firm is, I presume, a blend of industries. But the population is, and the marginal effect is an estimate of the average effect over a population. So, in brief, in this situation you cannot make claims about the effect on an individual firm, but you can make claims about the average effect in a population with a given distribution.

                Btw. in my model I'm intentionally omitting one or two categories of some variables, but they seem to be included when calculating margins, should I take care of it somehow?
                It depends on precisely what question you are trying to answer, but I think in your case it's fine the way it is. By using the -atmeans- option in your -margins- command, you have nailed down the exact values of all the variables in your model, and Stata calculates things by fixing all of the model variables at the values you specified, then calculating predictions or coefficients, and averaging. With all variables' values specified, it basically makes no difference which observations are included.

                Comment


                • #9
                  Dear Statalist,

                  I have one more question regarding the margins command.
                  I've tried to run these two:

                  Code:
                  1. margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) post noestimcheck
                  2. margins influenced, dydx( post) at(turnover) post noestimcheck
                  I'm trying to obtain the change in probability for the DiD coefficient (influenced##post) given different categories of the categorical variable "turnover" (which has 6 relevant categories + also a category coded [9] which i omitted from my logit regression and I'm not interested in). --- seems like the first option should be correct?
                  The second command seems to only report the DiD coefficient when turnover is atmeans - is that right?

                  Code:
                  . margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) post noestimcheck
                  
                  Average marginal effects                          Number of obs   =      18353
                  Model VCE    : Robust
                  
                  Expression   : Pr(constrained), predict()
                  dy/dx w.r.t. : 1.post
                  
                  1._at        : turnover        =           2
                  
                  2._at        : turnover        =           3
                  
                  3._at        : turnover        =           4
                  
                  4._at        : turnover        =           5
                  
                  5._at        : turnover        =           6
                  
                  6._at        : turnover        =           7
                  
                  ---------------------------------------------------------------------------------
                                  |            Delta-method
                                  |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  ----------------+----------------------------------------------------------------
                  1.post          |
                  _at#influenced |
                             1 0  |  -.0985093   .0033108   -29.75   0.000    -.1049984   -.0920203
                             1 1  |   .0128507   .0012119    10.60   0.000     .0104754    .0152259
                             2 0  |  -.0786486   .0045655   -17.23   0.000    -.0875967   -.0697004
                             2 1  |   .0100715   .0012874     7.82   0.000     .0075481    .0125948
                             3 0  |  -.0616462   .0030454   -20.24   0.000     -.067615   -.0556773
                             3 1  |    .007769   .0009917     7.83   0.000     .0058254    .0097126
                             4 0  |  -.1281285   .0060994   -21.01   0.000    -.1400831   -.1161739
                             4 1  |    .017198   .0018244     9.43   0.000     .0136222    .0207737
                             5 0  |  -.1265221   .0078416   -16.13   0.000    -.1418914   -.1111528
                             5 1  |   .0169552   .0017496     9.69   0.000      .013526    .0203843
                             6 0  |  -.1159428   .0049614   -23.37   0.000    -.1256669   -.1062187
                             6 1  |   .0153773   .0014058    10.94   0.000     .0126221    .0181325
                  ---------------------------------------------------------------------------------
                  Note: dy/dx for factor levels is the discrete change from the base level.
                  
                  . margins influenced, dydx( post) at(turnover) post noestimcheck
                  
                  Average marginal effects                          Number of obs   =      18353
                  Model VCE    : Robust
                  
                  Expression   : Pr(constrained), predict()
                  dy/dx w.r.t. : 1.post
                  at           : 2.turnover      =    .1764424 (mean)
                                 3.turnover      =    .1974478 (mean)
                                 4.turnover      =     .292587 (mean)
                                 5.turnover      =     .145019 (mean)
                                 6.turnover      =    .0925815 (mean)
                                 7.turnover      =    .0838497 (mean)
                                 9.turnover      =    .0120725 (mean)
                  
                  ------------------------------------------------------------------------------
                               |            Delta-method
                               |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                  1.post       |
                   influenced |
                            0  |  -.0903422   .0028782   -31.39   0.000    -.0959834    -.084701
                            1  |   .0116957    .001206     9.70   0.000     .0093319    .0140594
                  ------------------------------------------------------------------------------
                  Note: dy/dx for factor levels is the discrete change from the base level.
                  If I'm correct, in both of these cases, the other variables from the model are not included. So if I want to have the differences in probabilities for all categories of "turnover" as well as the other variables from the model being at their means, I should perhaps run the following command?

                  Code:
                  margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) atmeans post noestimcheck
                  Thank you so much for help!

                  Comment


                  • #10
                    I'm trying to obtain the change in probability for the DiD coefficient (influenced##post) given different categories of the categorical variable "turnover" (which has 6 relevant categories + also a category coded [9] which i omitted from my logit regression and I'm not interested in). --- seems like the first option should be correct?
                    The second command seems to only report the DiD coefficient when turnover is atmeans - is that right?
                    Both correct.

                    If I'm correct, in both of these cases, the other variables from the model are not included.
                    Not correct. The other model variables are included in the calculation, but they are not set to particular values. Rather, in each observation they are left at whatever values were observed for that observation. As a result, it can be said that the results of -margins- are adjusted to the observed distribution of the variables that are in the model but not mentioned in -margins-.

                    So if I want to have the differences in probabilities for all categories of "turnover" as well as the other variables from the model being at their means, I should perhaps run the following command?
                    Precisely so.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post

                      Another way that people can interpret logistic regression output is to look at marginal effects, from the -margins- command. So, if x is, say, a 0/1 indicator ("dummy") variable, and if your logistic regression command used factor variable notation (i.e. it looks like logit y i.x, perhaps with some other variables) then you could run -margins, dydx(x)- and the result would be the average difference in the probability of y between x = 0 and x = 1. That would be in percentage points. But that cannot be calculated just from knowing the regression coefficient because it depends on the full data set of values of x and y -- so to get this number you would have to rerun your -logit- and then do -margins-. There are a number of other kinds of marginal effects that can be estimated, depending on your model. These would also be denominated in percentage points.
                      Thanks Clyde for the clear note. So if my conditional logit regression is run as:

                      Code:
                      clogit        Response $Alternatives $Price, group(N_ID) vce(cluster UniqueID)
                      Code:
                      Iteration 0:   log pseudolikelihood = -3458.8878  
                      Iteration 1:   log pseudolikelihood = -3426.4228  
                      Iteration 2:   log pseudolikelihood = -3426.3201  
                      Iteration 3:   log pseudolikelihood = -3426.3201  
                      
                      Conditional (fixed-effects) logistic regression
                      
                                                                      Number of obs     =     14,880
                                                                      Wald chi2(11)     =     785.43
                                                                      Prob > chi2       =     0.0000
                      Log pseudolikelihood = -3426.3201               Pseudo R2         =     0.2289
                      
                                                    (Std. Err. adjusted for 277 clusters in UniqueID)
                      -------------------------------------------------------------------------------
                                    |               Robust
                           Response |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      --------------+----------------------------------------------------------------
                        d_Softdrink |  -2.647712   .4222995    -6.27   0.000    -3.475404    -1.82002
                          d_Juice25 |  -1.097884   .2752039    -3.99   0.000    -1.637274   -.5584945
                         d_FlavMilk |   .9318126   .3652523     2.55   0.011     .2159312    1.647694
                         d_Juice100 |   .3260234   .3166219     1.03   0.303    -.2945442     .946591
                       d_LowFatMilk |  -.6210219   .3460788    -1.79   0.073    -1.299324    .0572801
                        p_Softdrink |  -1.034097    .163582    -6.32   0.000    -1.354712    -.713482
                          p_Juice25 |  -.6930296    .082943    -8.36   0.000    -.8555949   -.5304643
                         p_FlavMilk |  -1.057374    .094211   -11.22   0.000    -1.242024    -.872724
                      p_BottleWater |  -.3476737   .0467741    -7.43   0.000    -.4393493    -.255998
                         p_Juice100 |  -.7878686    .066701   -11.81   0.000    -.9186002    -.657137
                       p_LowFatMilk |  -.7895434   .0970309    -8.14   0.000    -.9797205   -.5993663
                      -------------------------------------------------------------------------------
                      with $Alternatives is denoted as different alternatives of choice (after global command for d_typeofdrink, I make BottleWater omitted to compare), $Price (after global command for p_typeofdrink) as corresponding prices for different alternatives, and Response as Choice (dummy variable Yes or No). Then I run -margins- to interpret on how people prefer one drink by their prices changed and how they prefer one drink over the others:

                      Code:
                      margins, dydx ($As $Ps) atmeans
                      Code:
                      Conditional marginal effects                    Number of obs     =     14,880
                      Model VCE    : Robust
                      
                      Expression   : Pr(Response|fixed effect is 0), predict(pu0)
                      dy/dx w.r.t. : d_Softdrink d_Juice25 d_FlavMilk d_Juice100 d_LowFatMilk p_Softdrink p_Juice25 p_FlavMilk p_BottleWater
                                     p_Juice100 p_LowFatMilk
                      at           : d_Softdrink     =    .1666667 (mean)
                                     d_Juice25       =    .1666667 (mean)
                                     d_FlavMilk      =    .1666667 (mean)
                                     d_Juice100      =    .1666667 (mean)
                                     d_LowFatMilk    =    .1666667 (mean)
                                     p_Softdrink     =    .4149731 (mean)
                                     p_Juice25       =    .4128763 (mean)
                                     p_FlavMilk      =    .6156183 (mean)
                                     p_BottleWa~r    =    .8186022 (mean)
                                     p_Juice100      =     .600457 (mean)
                                     p_LowFatMilk    =    .5998925 (mean)
                      
                      -------------------------------------------------------------------------------
                                    |            Delta-method
                                    |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      --------------+----------------------------------------------------------------
                        d_Softdrink |  -.1076631   .0196984    -5.47   0.000    -.1462713   -.0690549
                          d_Juice25 |  -.0446429   .0088819    -5.03   0.000    -.0620511   -.0272347
                         d_FlavMilk |     .03789   .0204494     1.85   0.064      -.00219    .0779701
                         d_Juice100 |    .013257   .0149907     0.88   0.377    -.0161243    .0426382
                       d_LowFatMilk |  -.0252524   .0119073    -2.12   0.034    -.0485903   -.0019145
                        p_Softdrink |  -.0420492   .0101343    -4.15   0.000    -.0619121   -.0221862
                          p_Juice25 |  -.0281805   .0061617    -4.57   0.000    -.0402571   -.0161038
                         p_FlavMilk |  -.0429957   .0095804    -4.49   0.000    -.0617729   -.0242184
                      p_BottleWater |  -.0141373   .0014393    -9.82   0.000    -.0169584   -.0113163
                         p_Juice100 |  -.0320369   .0073254    -4.37   0.000    -.0463944   -.0176793
                       p_LowFatMilk |   -.032105   .0075832    -4.23   0.000    -.0469677   -.0172423
                      -------------------------------------------------------------------------------
                      So if I have the conclusion that, for example on Softdrink: for one unit increase in Softdrink price, the demand for Softdrink is reduced to 4% (p_Softdrink= -0.04). Compared to BottleWater, people prefer Softdrink less than 10% (d_Softdrink = -0.10). Is that true?

                      Thank you.

                      Comment

                      Working...
                      X