How to interpret log transformed independent variable in logistic regression

Kees Maat

Join Date: Jan 2016

Posts: 62
#16

27 Feb 2018, 02:10

Do I understand correctly that if (in the example) income would double, so with an income change of 100% extra, leading to ln(income) = ln(2). The odds ratio corresponding to the income change would be exp((log(0.25)*log(2)) = exp(-1.38629*0.693147) = exp(0.96091) = 0.383. So, every doubling of income would reduce the odds of the event happening to 38%.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#17

27 Feb 2018, 08:16

This is hard to do with the "page break" between the original and this.

The coefficient of log income is 0.25. If you double income, you increase log income by log 2. So the change in the log odds is 0.25*log(2) = 0.1733. So the odds ratio is exp(0.1733) = 1.19. So the odds of the event happening is 1.19 times as great as (or, equivalently, 19% greater than) the odds of the event happening in the absence of a doubling of the income (all else equal.)

Last edited by Clyde Schechter; 27 Feb 2018, 08:21. Reason: Original response was incorrect due to mis-remembering the original example.
Comment
Kees Maat

Join Date: Jan 2016

Posts: 62
#18

28 Feb 2018, 01:33

The page separation does indeed make it inconvenient. It would have been better if I had copied the example, because at the beginning of this post there was already confusion: the questioner then called the odds ratio a coefficient. He means that 0.25 is the odds ratio and exp (0.25) the coefficient. I had calculated the example with 0.25 as an odds ratio, you used 0.25 as a coefficient. Taking that into account, I apply the same calculation as you do, hence, I understand it. Thanks again for your help!
Comment
roxy kallsen

Join Date: Oct 2018

Posts: 37
#19

11 Sep 2019, 08:38

Originally posted by Clyde Schechter View Post

OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.

Hi
Clyde Schechter,

I came across this thread, which I found to be quite relevant to the problem I am trying to work through. I have a very similar question as the one Alex Mai asked. How to interpret the odds ratio of a log transformed independent variable in a random-effects logistic regression. So if I understand your suggestion correctly for a log family contact odds ratio of .1989529, the following is true.

log(.1989529)*log(1.01)= -0.00303036218

Therefore, as log family contact increased by one unit from 3 to 6 months, the odds of engaging in misconduct decreased by .3%?

I am mainly seeking some clarification for further understanding.

1. Would this be the correct interpretation even if using a panel data with two time points (month 3 and 6)?
2. Why do we log rather than to exponentiate? Is it because the log variables are already in odds ratio? Would I have to exponentiate if the log variable was a coefficient?
3. Do we ignore the negative sign in this case given that we are interpreting the odds ratio?

Thank you in advance,

Roxy

Hi
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#20

11 Sep 2019, 20:18

I'm a little reluctant to respond to your questions because you don't provide the actual logistic regression command you are referring to, and I'm not sure from your explanation that I completely understand what you are working with.

Nevertheless, the odds ratio is the exponentiated coefficient. And it is the coefficient that gets multiplied by the predictor variable. So first you take the logarithm of the odds ratio, and then that gets multiplied by the (difference in) the predictor variable.So if the predictor variable in your equation is log family contact, then a 1% difference in family contact corresponds to a log(1.01) difference in log family contact. So the resulting difference in log odds outcome (misconduct?) is indeed log(.1989529)*log(1.01), which I think is -0.0161 (to four decimal places), not the number you showed in #19. But we're not done. That's the difference in the log odds outcome. We want, I assume, the corresponding difference in the odds of the outcome. So we have to exponentiate that -0.0161, we get (to 4 decimal places) 0.9841 as the ratio of odds. So the odds have decreased by about 1.59%.

This is true whether we are talking a flat logistic regression or a panel logistic regression. Remember that if you have done a fixed effects logistic regression, then this effect on the outcome has to be interpreted as applying only within panels, not between them.

I think my first paragraph answers your question 1. As for question 3, no we do not ignore the negative sign.
Comment
Prince Amegbor

Join Date: Mar 2017

Posts: 7
#21

04 Aug 2022, 04:01

Originally posted by Clyde Schechter View Post

OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.

Hello Clyde Schechter I have been following the discussion on this topic. Thank you for the insightful answers and comments. I was wondering if there are reference(s) for the comment above. Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#22

04 Aug 2022, 09:05

I don't have a specific reference to offer. You should be able to find this in any textbook that covers the basics of logistic regression.
1 like
Comment
Prince Amegbor

Join Date: Mar 2017

Posts: 7
#23

05 Aug 2022, 03:47

Thank you Clyde Schechter
Comment

Abdan Syakura

Join Date: Nov 2018
Posts: 58

#24

05 Oct 2022, 01:45

Hi Clyde and All,

Could you please help how I can interpret a log-transformed independent variable in a probit model, please? The dependent variable is over (equal to one if the firm overinvest). The dependent variable is price growth and log of price standard deviation. I am unsure how to interpret the magnitude of the coefficient of log of price standard deviation. The following is the regression output.

Code:

xtprobit over5 wb_allG lnwb_allSD, re

Fitting comparison model:

Iteration 0:   log likelihood = -3234.9473  
Iteration 1:   log likelihood = -3233.7889  
Iteration 2:   log likelihood = -3233.7889  

Fitting full model:

rho =  0.0     log likelihood = -3233.7889
rho =  0.1     log likelihood =  -3008.252
rho =  0.2     log likelihood = -2950.5552
rho =  0.3     log likelihood = -2932.6589
rho =  0.4     log likelihood = -2933.0315

Iteration 0:   log likelihood = -2932.1976  
Iteration 1:   log likelihood = -2928.2208  
Iteration 2:   log likelihood = -2928.2082  
Iteration 3:   log likelihood = -2928.2082  

Random-effects probit regression                Number of obs     =      4,982
Group variable: firm                            Number of groups  =        386

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =       12.9
                                                              max =         29

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(2)      =       2.14
Log likelihood  = -2928.2082                    Prob > chi2       =     0.3424

------------------------------------------------------------------------------
       over5 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     wb_allG |  -.0015124   .0013596    -1.11   0.266    -.0041772    .0011524
  lnwb_allSD |  -.0250503   .0266311    -0.94   0.347    -.0772463    .0271457
       _cons |  -.4309126   .0575961    -7.48   0.000    -.5437989   -.3180264
-------------+----------------------------------------------------------------
    /lnsig2u |  -.6044636   .1214072                     -.8424174   -.3665099
-------------+----------------------------------------------------------------
     sigma_u |   .7391667   .0448701                      .6562531    .8325559
         rho |   .3533231   .0277398                      .3010259    .4093846
------------------------------------------------------------------------------
LR test of rho=0: chibar2(01) = 611.16                 Prob >= chibar2 = 0.000

I want to know how a one unit (untransformed) increase in the standard deviation will affect the probability of overinvestment. How can I do it through margins command? I think the following command is incorrect because I should re-transform (take the exponent of) the log variable first? But how? Thank you.

Code:

margins, dydx(lnwb_allSD)

Average marginal effects                        Number of obs     =      4,982
Model VCE    : OIM

Expression   : Pr(over5=1), predict(pr)
dy/dx w.r.t. : lnwb_allSD

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  lnwb_allSD |  -.0074846   .0079599    -0.94   0.347    -.0230858    .0081165
------------------------------------------------------------------------------

Best regards,

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29964
#25

05 Oct 2022, 08:52

In a log-transformed explanatory variable model, there is no such thing as the effect of a unit increase in the non-transformed explanatory variable. By using a log-transformed explanatory variable, you are asserting that the relationship between the (probit of) the outcome and the untransformed explanatory variable is non linear. Rather, with a log-transformed explanatory variable, it is the relative (multiplicative) difference in values that is associated with a constant difference in the probit outcome. For example, if the untransformed explanatory variable changes from 1 to 2, that is a doubling. A change from 3 to 6 is also a doubling. Although the absolute differences are 1 and 3, respectively, the associated change in probit outcome is the same with both of these doublings. By contrast, a change from 3 to 4 is, like from 1 to 2, a unit change. But the associated differences in the probit outcome will be different because the ratios, 2:1 and 4:3, are different. It is the ratio, not the difference, that is associated with a consistent difference in the probit outcome in this kind of model.

When we look at the effect on the probability of the outcome, it gets even more complicated because the probit link is itself non-linear. So the same difference in the probit outcome can be associated with different changes in the probability outcome, depending on the initial value of the probability outcome.

The margins output you show represents a marginal effect of the log transformed explanatory variable that is averaged over the range of values of lnwb_allSD and is also adjusted to the distribution of all the other variables in the model. It is, if you will, an "average effect" of a unit change in the log transformed explanatory variable on the probability outcome. But you cannot get anything similar to that for the untransformed variable using -margins- because your model doesn't even contain the untransformed variable. So -margins- cannot say anything about that.
1 like
Comment
Abdan Syakura

Join Date: Nov 2018

Posts: 58
#26

05 Oct 2022, 12:31

Thank you, Clyde. I will try not to transform the variable, then. I saw in this Stata tip 128 we can use margins, expression() to get around this log-transformed variable, but the command looks a bit complicated to me. Thank you.

Best regards,
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment