Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to interpret log transformed independent variable in logistic regression

    Dear all,

    My question is how to interpret the coefficient (in odds ratio) of a log transformed independent variable in a logistic regression. For example, if the coefficient of logged income is 0.25, which is the correct interpretation:
    A. a one percent increase in income decreases the odds ratio by 75% ((0.25-1)*100=-75)
    or
    B. a one percent increase in income decrease the odds ratio by 99.75% ((0.25/100-1)*100=-99.75)

    Thank you very much!
    Last edited by Alex Mai; 17 Feb 2017, 14:31.

  • #2
    None of the above.

    Let's work it through. If we have a 1% income difference, then the corresponding difference in log(income) is log(1.01) which is approximately (very close to) 0.01. The corresponding difference in xb = log odds ratio is then 0.25*0.01, or 0.00025. The corresponding odds ratio associated with this income difference is then exp(0.00025) which is approximately (very close to) 1.00025.

    Note also that a positive income difference is associated with an increase in odds, not a decrease when the coefficient is positive.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      None of the above.

      Let's work it through. If we have a 1% income difference, then the corresponding difference in log(income) is log(1.01) which is approximately (very close to) 0.01. The corresponding difference in xb = log odds ratio is then 0.25*0.01, or 0.00025. The corresponding odds ratio associated with this income difference is then exp(0.00025) which is approximately (very close to) 1.00025.

      Note also that a positive income difference is associated with an increase in odds, not a decrease when the coefficient is positive.
      Dear Clyde,

      Thank you very much! But the coefficient here is already in the form of odds-ratio (xtlogit y x, or), instead of logit-odds. I think 0.25 should be associated with a decrease in the odds of DepVar=1, because an odds-ratio<1 means that the odds of Y=1 decreases.

      So in this case, is it correct to say that a one percent increase in income decrease the odds ratio by 99.75% ((0.25/100-1)*100=-99.75)?

      Thank you again!
      Last edited by Alex Mai; 18 Feb 2017, 02:46.

      Comment


      • #4
        OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

        If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

        Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

          If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

          Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.
          Dear Clyde,

          Thank you so much for your detailed explanations! Now it makes sense to me quite well!

          Comment


          • #6
            Dear all,

            I have a similar question to Alex's question. However, I do not understand how to apply it to my results so that please allow me to use this chat to explain my question. In my case, my independent variable (logwomen100) is log transformed and the dependent variable (motivations) not. The other variables included in the model are control variables. (The model with results is attached). I wonder how to interpretate the value of logwomen100. I am right one percentage increase in logwomen100 will increase motivations 0.003 (0.31256/100). Does it correct?

            Thank you so much in advance!

            Comment


            • #7
              Sorry I forgot to attach the model i my previous comment. Here it goes!
              Attached Files

              Comment


              • #8
                Just beware your model is a linear regression, whereas the previous model was a logistic regression. That said, I guess the interpretation is correct, and goes on the same verge. To end, you may wish to check the command margins with elasticities.
                Best regards,

                Marcos

                Comment


                • #9
                  Ok! thank you so much for your help Marcos.

                  Comment


                  • #10
                    Dear all I just wanted to verify if I can apply Clyde interpretation guidelines to my regression output (see below). Would it be correct to interpret the coefficient I get for debt (0.68) as the presence of debt being able to increasing the odds of becoming non poor by (0.68*0.01), or 0.00068 then exp(0.00068) which is approximately (very close to) 1.00068? And so on and so forth for years of education? Would this same reasoning also apply for the marginal effects analysis, where I get the below output?


                    Click image for larger version

Name:	reg output.png
Views:	1
Size:	106.2 KB
ID:	1408563
                    Attached Files

                    Comment


                    • #11
                      You have posted only part of the output, and you have not posted the command that gave rise to it. So nobody but you knows what kind of regression was done, nor whether what you show are the outputs from the regression itself or from -margins-. Was the dependent variable log-transformed? It isn't possible to comment on the interpretation without that information.

                      Comment


                      • #12
                        Thanks clyde and sorry for the blank image. I would surely past outputs directly into the forum from now on. So leaving the question on the marginal effects (which I am dealing with a more appropriate thread) aside and focusing on the interpretation of the model please find below the command I used for the above output:

                        probit becamenonpoor09 r_debt r_children01 r_elderly01 lyearsedu r_noprimary r_nolsecondary r_precfloor r_notoilet r_plain r_phnom r_tonlesap r_rural, robust

                        Comment


                        • #13
                          This is a probit regression, and it does not involve any log-transformation, nor any log link. So none of what was said above applies. In fact, the interpretation of probit regression coefficients is really rather opaque. Here's what it means:

                          Each unit increase in r_debt (coefficient 0.69 to 2 decimal places) is associated with an increase in PHI-1(outcome probability) of 0.69. Here PHI is the cumulative normal distribution function (normal ogive). For example, suppose the baseline outcome probability is 0.25. Then PHI-1(0.25) (calculated in Stata as invnormal(0.25) = -0.674. Then a unit increase in r_debt is associated with an increase of that to -0.674 + 0.69 = 0.015. The corresponding probability is then PHI(0.015), calculated in Stata as normal(0.015) = 0.51 to two decimal places.

                          If you start from a different baseline outcome probability, the associated change will be different. Remember that the graph of the normal ogive is a sigmoidal curve (very similar in appearance to that of the logistic function). So at extreme base probabilities, the curve is very flat and a given increment (based on a probit regression coefficient) produces a pretty small change in probability, whereas with starting probabilities that are mid-range, the curve is very steep and the same increment produces a large change in probability. (The same qualitative behavior is true of logistic regression coefficient interpretation as well.)

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            OK. It's important to distinguish between coefficients and odds ratios! So if the odds ratio is 0.25, we can work it back the other way:

                            If the OR in the output is 0.25, then the coefficient is log(0.25). The change of 1% in x corresponds to a change in log(x) of log(1.01) just as before. So xb changes by log(0.25)*log(1.01). The odds ratio corresponding to a change in xb of log(0.25)*log(1.01) is exp(log(0.25)*log(1.01)), which, to three decimal places, is 0.986, which is the answer to your problem.

                            Note that this answer is slightly different from what you got. The formula you are using is an approximation, and it is widely taught. At least in this case, the difference between 0.9975 and 0.986 may be of no practical importance. But the exact calculations are really not that difficult, and the approximation formula you used will perform badly when the change in income is appreciably larger than 1%. Since the exact (other than rounding/precision issues) formula is not hard to work with, I think using the exact approach is better.
                            Dear Clyde,

                            I just read the clear explanation you wrote about a year ago. For me, one thing is still unclear. Why did you do this by using a 1% change, as we usually say that the effect is on the basis of a one unit change in the IV. And what does this mean for the resulting odds ratio: should we interpret this odds ratio as the effect of a one unit change (one step in the log-transformed variable), or as a 1% change in the log-transformed variable?

                            Thanks.

                            Comment


                            • #15
                              Why did you do this by using a 1% change, as we usually say that the effect is on the basis of a one unit change in the IV.
                              What is usual in one context may be exceptional in another. The question in the original post was actually phrased in terms of a 1% change, so that is what I responded to.

                              In some disciplines, or in some contexts, effects are commonly reported as elasticities (the proportionate change in y associated with a proportionate change in x) or semi-elasticities (the absolute change in y associated with a proportionate change in x). It is partly a matter of custom. It is also partly a matter of modeling: if y and x have a power law relationship, then they will except constant elasticity, where as the change in y associated with a fixed unit change in x will vary with x (unless the exponent of the power is 1).

                              Any regression may be interpreted either way. Again, there are contexts where one or the other is more natural. And, in the case of a logistic model, it is important to remember that neither the unit change "effect" nor the 1% change "effect" will be a constant: because of the logistic link, both of these will vary with x itself.

                              Comment

                              Working...
                              X