Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When should I log transform a regressor?

    Hello,
    I am running the following regression.´reg log_y log_X´, which yealds a coefficient of 0.4. As indicated here:
    https://stats.oarc.ucla.edu/other/mu...g-transformed/ a coefficient of .4 would suggest that a 1% increase in X corresponds to a 0.4% increase in Y.

    however I am having doubts of whether I should log my main regressor of interest at all.
    My main regressor of interest (X) is a variable that goes from 0 to 0.12. The variable is extremely skewed, with a lot of 0s. I decided to log transform the variable {log(1+X)} to facilitate interpretation of my results and interpret them as % changes.

    However, I am not sure about whether this is correct. Should I use log transformations for variables that are entirely contained between 0 and 1. Can I still interpret the coefficient as % change? Would you reccomend another approach?

    thanks a lot in advance

  • #2
    log (1 + X) rarely helps with interpretation. Why should it do that? On the other hand, log (1 + X) can sometimes help in visualization whenever logarithmic scale seems about right for plotting positive values but you have zeros that should be plotted too, if only as a kind of marginal rug.

    Much depends on the range of X The dynamic range max / min for min > 0 is diagnostic. If it is close to 1, transformation is futile. If it's enormously bigger than 1, transformation may be not only a very good idea, but the only serious game in town.

    But choice of functional form is a tricky question. What are some answers? In turn there are just different questions.

    Is there theory suggesting a particular form of relationship?

    That morphs indiscernibly into

    What kind of relationship appears in literature?

    Starting at the other end, a scatter plot should help to suggest what kind of bivariate relationship makes sense with (other things being equal) a preference for whatever makes patterns more nearly linear.

    Variables that are always positive lend themselves most readily to transformation.

    Variables that are zero or positive OR negative, zero and positive can be transformed. I'd summarize several personal prejudices by saying the only transformations worth considering are those that preserve sign, so that negative, zero, positive values have the same sign after transformation.

    To vary an ancient joke, transformation is the worst (un)taught part of statistics, meaning that it often not really taught at all, but all of a sudden what you're reading looks a chaotic mess with some researchers freely taking logs or roots or using yet more esoteric functions, and others seemingly avoiding them like the proverbial plague.

    Over and above all of that are link functions, i.e. working on a transformed scale for the outcome variable, but without doing the transformation. Poisson and logit regression are the simplest examples.

    Comment

    Working...
    X