How to interpret log transformed independent variable in logistic regression

Alex Mai

Join Date: May 2016

Posts: 213
#1

How to interpret log transformed independent variable in logistic regression

17 Feb 2017, 14:28

Dear all,

My question is how to interpret the coefficient (in odds ratio) of a log transformed independent variable in a logistic regression. For example, if the coefficient of logged income is 0.25, which is the correct interpretation:
A. a one percent increase in income decreases the odds ratio by 75% ((0.25-1)*100=-75)
or
B. a one percent increase in income decrease the odds ratio by 99.75% ((0.25/100-1)*100=-99.75)

Thank you very much!

Last edited by Alex Mai; 17 Feb 2017, 14:31.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

17 Feb 2017, 15:39

None of the above.

Let's work it through. If we have a 1% income difference, then the corresponding difference in log(income) is log(1.01) which is approximately (very close to) 0.01. The corresponding difference in xb = log odds ratio is then 0.25*0.01, or 0.00025. The corresponding odds ratio associated with this income difference is then exp(0.00025) which is approximately (very close to) 1.00025.

Note also that a positive income difference is associated with an increase in odds, not a decrease when the coefficient is positive.
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#3

18 Feb 2017, 02:43

Originally posted by Clyde Schechter View Post

None of the above.

Let's work it through. If we have a 1% income difference, then the corresponding difference in log(income) is log(1.01) which is approximately (very close to) 0.01. The corresponding difference in xb = log odds ratio is then 0.25*0.01, or 0.00025. The corresponding odds ratio associated with this income difference is then exp(0.00025) which is approximately (very close to) 1.00025.

Note also that a positive income difference is associated with an increase in odds, not a decrease when the coefficient is positive.

Dear Clyde,

Thank you very much! But the coefficient here is already in the form of odds-ratio (xtlogit y x, or), instead of logit-odds. I think 0.25 should be associated with a decrease in the odds of DepVar=1, because an odds-ratio<1 means that the odds of Y=1 decreases.

So in this case, is it correct to say that a one percent increase in income decrease the odds ratio by 99.75% ((0.25/100-1)*100=-99.75)?

Thank you again!

Last edited by Alex Mai; 18 Feb 2017, 02:46.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

18 Feb 2017, 09:41

OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#5

24 Feb 2017, 04:58

Originally posted by Clyde Schechter View Post

OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.

Dear Clyde,

Thank you so much for your detailed explanations! Now it makes sense to me quite well!
Comment
Luisa Márquez

Join Date: Apr 2014

Posts: 27
#6

07 Mar 2017, 03:59

Dear all,

I have a similar question to Alex's question. However, I do not understand how to apply it to my results so that please allow me to use this chat to explain my question. In my case, my independent variable (logwomen100) is log transformed and the dependent variable (motivations) not. The other variables included in the model are control variables. (The model with results is attached). I wonder how to interpretate the value of logwomen100. I am right one percentage increase in logwomen100 will increase motivations 0.003 (0.31256/100). Does it correct?

Thank you so much in advance!
Comment
Luisa Márquez

Join Date: Apr 2014

Posts: 27
#7

07 Mar 2017, 04:00

Sorry I forgot to attach the model i my previous comment. Here it goes!
Attached Files
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

07 Mar 2017, 04:24

Just beware your model is a linear regression, whereas the previous model was a logistic regression. That said, I guess the interpretation is correct, and goes on the same verge. To end, you may wish to check the command margins with elasticities.

Best regards,

Marcos
Comment
Luisa Márquez

Join Date: Apr 2014

Posts: 27
#9

07 Mar 2017, 04:36

Ok! thank you so much for your help Marcos.
Comment
Marisa Foraci

Join Date: Mar 2017

Posts: 10
#10

31 Aug 2017, 00:38

Dear all I just wanted to verify if I can apply Clyde interpretation guidelines to my regression output (see below). Would it be correct to interpret the coefficient I get for debt (0.68) as the presence of debt being able to increasing the odds of becoming non poor by (0.68*0.01), or 0.00068 then exp(0.00068) which is approximately (very close to) 1.00068? And so on and so forth for years of education? Would this same reasoning also apply for the marginal effects analysis, where I get the below output?

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

31 Aug 2017, 08:14

You have posted only part of the output, and you have not posted the command that gave rise to it. So nobody but you knows what kind of regression was done, nor whether what you show are the outputs from the regression itself or from -margins-. Was the dependent variable log-transformed? It isn't possible to comment on the interpretation without that information.
Comment
Marisa Foraci

Join Date: Mar 2017

Posts: 10
#12

01 Sep 2017, 01:58

Thanks clyde and sorry for the blank image. I would surely past outputs directly into the forum from now on. So leaving the question on the marginal effects (which I am dealing with a more appropriate thread) aside and focusing on the interpretation of the model please find below the command I used for the above output:

probit becamenonpoor09 r_debt r_children01 r_elderly01 lyearsedu r_noprimary r_nolsecondary r_precfloor r_notoilet r_plain r_phnom r_tonlesap r_rural, robust
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

01 Sep 2017, 08:48

This is a probit regression, and it does not involve any log-transformation, nor any log link. So none of what was said above applies. In fact, the interpretation of probit regression coefficients is really rather opaque. Here's what it means:

Each unit increase in r_debt (coefficient 0.69 to 2 decimal places) is associated with an increase in PHI^-1(outcome probability) of 0.69. Here PHI is the cumulative normal distribution function (normal ogive). For example, suppose the baseline outcome probability is 0.25. Then PHI^-1(0.25) (calculated in Stata as invnormal(0.25) = -0.674. Then a unit increase in r_debt is associated with an increase of that to -0.674 + 0.69 = 0.015. The corresponding probability is then PHI(0.015), calculated in Stata as normal(0.015) = 0.51 to two decimal places.

If you start from a different baseline outcome probability, the associated change will be different. Remember that the graph of the normal ogive is a sigmoidal curve (very similar in appearance to that of the logistic function). So at extreme base probabilities, the curve is very flat and a given increment (based on a probit regression coefficient) produces a pretty small change in probability, whereas with starting probabilities that are mid-range, the curve is very steep and the same increment produces a large change in probability. (The same qualitative behavior is true of logistic regression coefficient interpretation as well.)
1 like
Comment
Kees Maat

Join Date: Jan 2016

Posts: 62
#14

26 Feb 2018, 10:01

Originally posted by Clyde Schechter View Post

OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.

Dear Clyde,

I just read the clear explanation you wrote about a year ago. For me, one thing is still unclear. Why did you do this by using a 1% change, as we usually say that the effect is on the basis of a one unit change in the IV. And what does this mean for the resulting odds ratio: should we interpret this odds ratio as the effect of a one unit change (one step in the log-transformed variable), or as a 1% change in the log-transformed variable?

Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

26 Feb 2018, 10:30

Why did you do this by using a 1% change, as we usually say that the effect is on the basis of a one unit change in the IV.

What is usual in one context may be exceptional in another. The question in the original post was actually phrased in terms of a 1% change, so that is what I responded to.

In some disciplines, or in some contexts, effects are commonly reported as elasticities (the proportionate change in y associated with a proportionate change in x) or semi-elasticities (the absolute change in y associated with a proportionate change in x). It is partly a matter of custom. It is also partly a matter of modeling: if y and x have a power law relationship, then they will except constant elasticity, where as the change in y associated with a fixed unit change in x will vary with x (unless the exponent of the power is 1).

Any regression may be interpreted either way. Again, there are contexts where one or the other is more natural. And, in the case of a logistic model, it is important to remember that neither the unit change "effect" nor the 1% change "effect" will be a constant: because of the logistic link, both of these will vary with x itself.
Comment

Announcement

How to interpret log transformed independent variable in logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment