Interpreting coefficients in logit

Jane Dunham

Join Date: Jan 2019

Posts: 8
#1

Interpreting coefficients in logit

29 Jan 2019, 16:26

Dear Statalist,

I'm running a logit model and one of my coefficients is -1.0954 and statistically significant at 1% level. In one of the papers I read, they interpreted coefficients in logit as "percentage points", so in my case that would mean that a company is 109.5 percentage points less likely to be affected, which doesn't really make sense. Will be grateful for your insights.

Thank you!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#2

29 Jan 2019, 18:09

I don't know what you read, but either it is quite wrong or your misunderstood it.

-logit- reports logistic regression coefficients, which are in the log odds metric, not percentage points. The log odds metric doesn't come naturally to most people, so when interpreting a logistic regression, one often exponentiates the coefficients, to turn them into odds ratios. To two decimal places, exp(-1.0954) == 0.33. So one way to interpret the results is that a unit increase in whatever variable (let's call it x) this is the coefficient of is associated with decreased odds of whatever the positive outcome in your model is (let's call it y) by a factor of 0.33 (or roughly 1/3).

Another way that people can interpret logistic regression output is to look at marginal effects, from the -margins- command. So, if x is, say, a 0/1 indicator ("dummy") variable, and if your logistic regression command used factor variable notation (i.e. it looks like logit y i.x, perhaps with some other variables) then you could run -margins, dydx(x)- and the result would be the average difference in the probability of y between x = 0 and x = 1. That would be in percentage points. But that cannot be calculated just from knowing the regression coefficient because it depends on the full data set of values of x and y -- so to get this number you would have to rerun your -logit- and then do -margins-. There are a number of other kinds of marginal effects that can be estimated, depending on your model. These would also be denominated in percentage points.
3 likes
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4947
#3

29 Jan 2019, 20:46

Jane, welcome to Statalist. I would recommend reading some introductory texts on logistic regression. If you want something that is free and online, you might check the "basics of logistic regression handouts at

https://www3.nd.edu/~rwilliam/stats3/index.html

Clyde does give an excellent summary but if you need it those handouts give more of a blow by blow explanation.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
2 likes
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

30 Jan 2019, 12:57

Some of us prefer logit and probabilities to odds ratios (the default in logistic). You should also look at the margins command which is extremely helpful in interpreting results (particularly in non-linear models).
Comment
Jane Dunham

Join Date: Jan 2019

Posts: 8
#5

31 Jan 2019, 15:58

Thank you all for your replies, they have been really helpful! I also already went through some of the handouts and planning to finish them all to get a better hold of it

Although now I understand that having a coeficient of -1.0954 in logit is fine, I'm still quite unsure about how to interpret it when this coefficient is from the main interaction term (I'm doing DiD in logit). I've found some interpretation of interaction terms in logit on the UCLA website (https://stats.idre.ucla.edu/stata/se...tic-regression), however, when running those commands in Stata, I got that the interaction can't be estimated:

Code:

margins, over (post influenced) atmeans expression(exp(xb())) noatlegend margins, over (post influenced) atmeans margins influenced, dydx( post) atmeans post margins post#influenced, atmeans

Code:

Adjusted predictions Number of obs = 18818 Model VCE : Robust Expression : Pr(limited), predict() over : post influenced ------------------------------------------------------------------------------- | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- post#influenced | 0 0 | . (not estimable) 0 1 | . (not estimable) 1 0 | . (not estimable) 1 1 | . (not estimable) -------------------------------------------------------------------------------

Maybe I can add that all my variables in the model are categorical and I'm using factor variables' notations (ib2.X1 io6.X1 etc.) and clustering.

Would be really grateful if you have any idea why the interaction can't be estimated.

Jane
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#6

01 Feb 2019, 15:03

The results you are seeking from

Code:

margins, over (post influenced) atmeans expression(exp(xb())) noatlegend margins, over (post influenced) atmeans margins post#influenced, atmeans

are truly not estimable if you are using panel data with fixed effects estimation (i.e. -xtlogit, fe- or -clogit-). That is, these results are not identifiable in such a model because of colinearity of post and influenced with the time and panel fixed effects, respectively. If you are not using a fixed-effects estimator, then it is not at all apparent to me why you should be getting those results. It would be necessary to see the -logit- output itself and example data to try to figure that out.

However, you normally can get results for:

Code:

margins influenced, dydx( post) atmeans post noestimcheck

The noestimcheck option must be specified here. If this does not work for you, again, post back showing the -xtlogit- or -logit- output itself along with example data.

Note: If you specify the -noestimcheck- option for the first three -margins- commands mentioned at the top of this post, Stata will spit out some numbers at you. But those results should not be used: those margins are not identifiable in that model. But the results for -margins, dydx(post)- are OK as that parameter is identifiable.
1 like
Comment

Jane Dunham

Join Date: Jan 2019
Posts: 8

25 Mar 2019, 16:43

Dear Clyde,

Thank you for your answer and I'm sorry for my late reply, I've had to put this aside for a while, but getting back to it again.

You are right that I'm running a logit regression with fixed effects, however, my data is cross-sectional not a panel.

I've tried to run the command you recommended to interpret the interaction term of DiD logit model, and here's what I got:

Code:

margins influenced, dydx( post) atmeans post noestimcheck

Conditional marginal effects                      Number of obs   =      17595
Model VCE    : Robust

Expression   : Pr(_rationed), predict()
dy/dx w.r.t. : 1.post
at           : 0.post          =    .5235977 (mean)
               1.post          =    .4764023 (mean)
               0.influenced~o    =    .7503734 (mean)
               1.influenced~o    =    .2496266 (mean)
               1.size          =    .2517772 (mean)
               2.size          =    .2122774 (mean)
               3.size          =    .1818343 (mean)
               4.size          =    .3541111 (mean)
               autonomous      =    .8342637 (mean)
               2.turnover      =    .1795122 (mean)
               3.turnover      =    .1979447 (mean)
               4.turnover      =    .2796653 (mean)
               5.turnover      =    .1492459 (mean)
               6.turnover      =    .0949061 (mean)
               7.turnover      =     .086376 (mean)
               9.turnover      =    .0123497 (mean)
               1.age           =     .871625 (mean)
               2.age           =    .0907379 (mean)
               3.age           =    .0262876 (mean)
               4.age           =    .0102363 (mean)
               9.age           =    .0011131 (mean)
               family          =    .7036842 (mean)
               export25        =    .2799212 (mean)
               1.profitch~e    =    .3526121 (mean)
               2.profitch~e    =    .3184859 (mean)
               3.profitch~e    =    .3118189 (mean)
               9.profitch~e    =    .0170832 (mean)
               1.employee~e    =    .3294764 (mean)
               2.employee~e    =    .5219856 (mean)
               3.employee~e    =     .148538 (mean)
               1.debtchange    =    .2553983 (mean)
               2.debtchange    =    .4520023 (mean)
               3.debtchange    =    .2535825 (mean)
               7.debtchange    =    .0156973 (mean)
               9.debtchange    =    .0233196 (mean)
               1.outlookb~r    =    .3991901 (mean)
               2.outlookb~r    =    .4095124 (mean)
               3.outlookb~r    =    .1762917 (mean)
               9.outlookb~r    =    .0150059 (mean)
               1.capitalb~r    =    .3150785 (mean)
               2.capitalb~r    =    .5619572 (mean)
               3.capitalb~r    =    .1141789 (mean)
               9.capitalb~r    =    .0087854 (mean)
               1.credithi~r    =    .3399481 (mean)
               2.credithi~r    =    .5378428 (mean)
               3.credithi~r    =    .1063219 (mean)
               7.credithi~r    =    .0087859 (mean)
               9.credithi~r    =    .0071013 (mean)
               1.wave         =    .1801591 (mean)
               2.wave         =    .1669349 (mean)
               3.wave         =    .1765038 (mean)
               5.wave         =    .1796613 (mean)
               6.wave         =    .1524296 (mean)
               7.wave         =    .1443114 (mean)
               1.c0            =    .0244249 (mean)
               2.c0            =    .0238508 (mean)
               3.c0            =    .2357281 (mean)
               4.c0            =    .1581079 (mean)
               5.c0            =    .0080094 (mean)
               6.c0            =    .2355359 (mean)
               7.c0            =    .0163491 (mean)
               8.c0            =    .0072683 (mean)
               9.c0            =    .2303259 (mean)
               10.c0           =    .0261484 (mean)
               11.c0           =    .0342513 (mean)
               1.industry      =    .1308807 (mean)
               2.industry      =    .0653122 (mean)
               3.industry      =     .169847 (mean)
               4.industry      =     .279849 (mean)
               9.industry      =    .3541111 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.post       |
 influenced |
          0  |  -.0069799   .0023255    -3.00   0.003    -.0115378   -.0024219
          1  |   .0081624   .0006487    12.58   0.000     .0068909    .0094339
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

1. If I understand it correctly, the companies in the treatment group are 0.8 percentage points more likely to be rationed after the event?

2. Is it correct to use option "atmeans" when my variables are categorical? If not, what's the right way to do it? (Btw. in my model I'm intentionally omitting one or two categories of some variables, but they seem to be included when calculating margins, should I take care of it somehow?)

Thank you for your advice!

Jane

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#8

25 Mar 2019, 16:59

1. If I understand it correctly, the companies in the treatment group are 0.8 percentage points more likely to be rationed after the event?

Yes, conditional on everything else being at the mean values in your sample.

2. Is it correct to use option "atmeans" when my variables are categorical? If not, what's the right way to do it?

It is OK provided they were coded with factor variable notation in your original model (which, from your -margins- output, it looks like they were) and provided you interpret them correctly. So, looking, for example, at the mean values for the industry indicators 1.industry through 9.industry, the interpretation is that the marginal effect is being calculated for a population of firms in which the proportions of industries 1, 2, 3, 4, and 9 are, respectively, to whole percentages) 13%, 7%, 17%, 28%, and 35%. No particular firm is, I presume, a blend of industries. But the population is, and the marginal effect is an estimate of the average effect over a population. So, in brief, in this situation you cannot make claims about the effect on an individual firm, but you can make claims about the average effect in a population with a given distribution.

Btw. in my model I'm intentionally omitting one or two categories of some variables, but they seem to be included when calculating margins, should I take care of it somehow?

It depends on precisely what question you are trying to answer, but I think in your case it's fine the way it is. By using the -atmeans- option in your -margins- command, you have nailed down the exact values of all the variables in your model, and Stata calculates things by fixing all of the model variables at the values you specified, then calculating predictions or coefficients, and averaging. With all variables' values specified, it basically makes no difference which observations are included.
Comment

Jane Dunham

Join Date: Jan 2019
Posts: 8

08 Apr 2019, 12:18

Dear Statalist,

I have one more question regarding the margins command.
I've tried to run these two:

Code:

1. margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) post noestimcheck
2. margins influenced, dydx( post) at(turnover) post noestimcheck

I'm trying to obtain the change in probability for the DiD coefficient (influenced##post) given different categories of the categorical variable "turnover" (which has 6 relevant categories + also a category coded [9] which i omitted from my logit regression and I'm not interested in). --- seems like the first option should be correct?
The second command seems to only report the DiD coefficient when turnover is atmeans - is that right?

Code:

. margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) post noestimcheck

Average marginal effects                          Number of obs   =      18353
Model VCE    : Robust

Expression   : Pr(constrained), predict()
dy/dx w.r.t. : 1.post

1._at        : turnover        =           2

2._at        : turnover        =           3

3._at        : turnover        =           4

4._at        : turnover        =           5

5._at        : turnover        =           6

6._at        : turnover        =           7

---------------------------------------------------------------------------------
                |            Delta-method
                |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
1.post          |
_at#influenced |
           1 0  |  -.0985093   .0033108   -29.75   0.000    -.1049984   -.0920203
           1 1  |   .0128507   .0012119    10.60   0.000     .0104754    .0152259
           2 0  |  -.0786486   .0045655   -17.23   0.000    -.0875967   -.0697004
           2 1  |   .0100715   .0012874     7.82   0.000     .0075481    .0125948
           3 0  |  -.0616462   .0030454   -20.24   0.000     -.067615   -.0556773
           3 1  |    .007769   .0009917     7.83   0.000     .0058254    .0097126
           4 0  |  -.1281285   .0060994   -21.01   0.000    -.1400831   -.1161739
           4 1  |    .017198   .0018244     9.43   0.000     .0136222    .0207737
           5 0  |  -.1265221   .0078416   -16.13   0.000    -.1418914   -.1111528
           5 1  |   .0169552   .0017496     9.69   0.000      .013526    .0203843
           6 0  |  -.1159428   .0049614   -23.37   0.000    -.1256669   -.1062187
           6 1  |   .0153773   .0014058    10.94   0.000     .0126221    .0181325
---------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins influenced, dydx( post) at(turnover) post noestimcheck

Average marginal effects                          Number of obs   =      18353
Model VCE    : Robust

Expression   : Pr(constrained), predict()
dy/dx w.r.t. : 1.post
at           : 2.turnover      =    .1764424 (mean)
               3.turnover      =    .1974478 (mean)
               4.turnover      =     .292587 (mean)
               5.turnover      =     .145019 (mean)
               6.turnover      =    .0925815 (mean)
               7.turnover      =    .0838497 (mean)
               9.turnover      =    .0120725 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.post       |
 influenced |
          0  |  -.0903422   .0028782   -31.39   0.000    -.0959834    -.084701
          1  |   .0116957    .001206     9.70   0.000     .0093319    .0140594
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

If I'm correct, in both of these cases, the other variables from the model are not included. So if I want to have the differences in probabilities for all categories of "turnover" as well as the other variables from the model being at their means, I should perhaps run the following command?

Code:

margins influenced, dydx( post) at(turnover=(2 3 4 5 6 7)) atmeans post noestimcheck

Thank you so much for help!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29959
#10

08 Apr 2019, 12:30

I'm trying to obtain the change in probability for the DiD coefficient (influenced##post) given different categories of the categorical variable "turnover" (which has 6 relevant categories + also a category coded [9] which i omitted from my logit regression and I'm not interested in). --- seems like the first option should be correct?
The second command seems to only report the DiD coefficient when turnover is atmeans - is that right?

Both correct.

If I'm correct, in both of these cases, the other variables from the model are not included.

Not correct. The other model variables are included in the calculation, but they are not set to particular values. Rather, in each observation they are left at whatever values were observed for that observation. As a result, it can be said that the results of -margins- are adjusted to the observed distribution of the variables that are in the model but not mentioned in -margins-.

So if I want to have the differences in probabilities for all categories of "turnover" as well as the other variables from the model being at their means, I should perhaps run the following command?

Precisely so.
Comment

Vince Vo

Join Date: Jan 2020
Posts: 19

#11

16 Feb 2020, 21:44

Originally posted by Clyde Schechter View Post

Another way that people can interpret logistic regression output is to look at marginal effects, from the -margins- command. So, if x is, say, a 0/1 indicator ("dummy") variable, and if your logistic regression command used factor variable notation (i.e. it looks like logit y i.x, perhaps with some other variables) then you could run -margins, dydx(x)- and the result would be the average difference in the probability of y between x = 0 and x = 1. That would be in percentage points. But that cannot be calculated just from knowing the regression coefficient because it depends on the full data set of values of x and y -- so to get this number you would have to rerun your -logit- and then do -margins-. There are a number of other kinds of marginal effects that can be estimated, depending on your model. These would also be denominated in percentage points.

Thanks Clyde for the clear note. So if my conditional logit regression is run as:

Code:

clogit        Response $Alternatives $Price, group(N_ID) vce(cluster UniqueID)

Code:

Iteration 0:   log pseudolikelihood = -3458.8878  
Iteration 1:   log pseudolikelihood = -3426.4228  
Iteration 2:   log pseudolikelihood = -3426.3201  
Iteration 3:   log pseudolikelihood = -3426.3201  

Conditional (fixed-effects) logistic regression

                                                Number of obs     =     14,880
                                                Wald chi2(11)     =     785.43
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -3426.3201               Pseudo R2         =     0.2289

                              (Std. Err. adjusted for 277 clusters in UniqueID)
-------------------------------------------------------------------------------
              |               Robust
     Response |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
  d_Softdrink |  -2.647712   .4222995    -6.27   0.000    -3.475404    -1.82002
    d_Juice25 |  -1.097884   .2752039    -3.99   0.000    -1.637274   -.5584945
   d_FlavMilk |   .9318126   .3652523     2.55   0.011     .2159312    1.647694
   d_Juice100 |   .3260234   .3166219     1.03   0.303    -.2945442     .946591
 d_LowFatMilk |  -.6210219   .3460788    -1.79   0.073    -1.299324    .0572801
  p_Softdrink |  -1.034097    .163582    -6.32   0.000    -1.354712    -.713482
    p_Juice25 |  -.6930296    .082943    -8.36   0.000    -.8555949   -.5304643
   p_FlavMilk |  -1.057374    .094211   -11.22   0.000    -1.242024    -.872724
p_BottleWater |  -.3476737   .0467741    -7.43   0.000    -.4393493    -.255998
   p_Juice100 |  -.7878686    .066701   -11.81   0.000    -.9186002    -.657137
 p_LowFatMilk |  -.7895434   .0970309    -8.14   0.000    -.9797205   -.5993663
-------------------------------------------------------------------------------

with $Alternatives is denoted as different alternatives of choice (after global command for d_typeofdrink, I make BottleWater omitted to compare), $Price (after global command for p_typeofdrink) as corresponding prices for different alternatives, and Response as Choice (dummy variable Yes or No). Then I run -margins- to interpret on how people prefer one drink by their prices changed and how they prefer one drink over the others:

Code:

margins, dydx ($As $Ps) atmeans

Code:

Conditional marginal effects                    Number of obs     =     14,880
Model VCE    : Robust

Expression   : Pr(Response|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : d_Softdrink d_Juice25 d_FlavMilk d_Juice100 d_LowFatMilk p_Softdrink p_Juice25 p_FlavMilk p_BottleWater
               p_Juice100 p_LowFatMilk
at           : d_Softdrink     =    .1666667 (mean)
               d_Juice25       =    .1666667 (mean)
               d_FlavMilk      =    .1666667 (mean)
               d_Juice100      =    .1666667 (mean)
               d_LowFatMilk    =    .1666667 (mean)
               p_Softdrink     =    .4149731 (mean)
               p_Juice25       =    .4128763 (mean)
               p_FlavMilk      =    .6156183 (mean)
               p_BottleWa~r    =    .8186022 (mean)
               p_Juice100      =     .600457 (mean)
               p_LowFatMilk    =    .5998925 (mean)

-------------------------------------------------------------------------------
              |            Delta-method
              |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
  d_Softdrink |  -.1076631   .0196984    -5.47   0.000    -.1462713   -.0690549
    d_Juice25 |  -.0446429   .0088819    -5.03   0.000    -.0620511   -.0272347
   d_FlavMilk |     .03789   .0204494     1.85   0.064      -.00219    .0779701
   d_Juice100 |    .013257   .0149907     0.88   0.377    -.0161243    .0426382
 d_LowFatMilk |  -.0252524   .0119073    -2.12   0.034    -.0485903   -.0019145
  p_Softdrink |  -.0420492   .0101343    -4.15   0.000    -.0619121   -.0221862
    p_Juice25 |  -.0281805   .0061617    -4.57   0.000    -.0402571   -.0161038
   p_FlavMilk |  -.0429957   .0095804    -4.49   0.000    -.0617729   -.0242184
p_BottleWater |  -.0141373   .0014393    -9.82   0.000    -.0169584   -.0113163
   p_Juice100 |  -.0320369   .0073254    -4.37   0.000    -.0463944   -.0176793
 p_LowFatMilk |   -.032105   .0075832    -4.23   0.000    -.0469677   -.0172423
-------------------------------------------------------------------------------

So if I have the conclusion that, for example on Softdrink: for one unit increase in Softdrink price, the demand for Softdrink is reduced to 4% (p_Softdrink= -0.04). Compared to BottleWater, people prefer Softdrink less than 10% (d_Softdrink = -0.10). Is that true?

Thank you.

Announcement