Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • negative value log transformation - interpretation

    Dear all.
    I'm implementing the analysis using the household's income data in 2010-2012.
    Since the income data was not normally distributed (and have a wide range), I'd like to do the log-transformation.
    The problem is that there are negative values in income data.
    I searched for log-transformation for negative values, and found this code.

    Code:
    sign(x) * log(1 + abs(x))

    According to this post (http://blogs.sas.com/content/iml/201...f-pos-neg.html), this function acts like the log (base 10) function when x > 0.

    However, when I calculate this function, the result was different from log (base 10) function.

    For example, if x=1000, L(1000)=sign(1000)*log(1+abs(1000)) = 6.908 while log10(1000)=3.

    Do I misunderstand something? Otherwise, is there other way to interpret this function?

    Many thanks,

  • #2
    First, log() in Stata is the natural (base e) logarithm, not logarithm base 10. If you want base 10 logarithms, the Stata function for that is -log10()-.

    In any case, I wouldn't recommend using that approach for income data. If you look at the post you linked to more carefully, you will see that that transformation is best suited to deal with distributions that are more or less symmetrical around zero but exhibit too wide a variance. For that purpose, one might make a case for doing it (though, even there, I would try a cube root transformation first.)

    More important, why do you want to transform the income variable at all? There are few statistical models that require a variable to have a normal distribution. OLS regression inferences are supported when the model residuals, not the variables, have a normal distribution, though even that is not a necessary condition. The notion that variables in a linear regression need to have normal distributions is a widespread misunderstanding. What role does income play in your problem? Is it the dependent variable or one of the independent ones? Have you explored graphically how it relates to the other variables you will be analyzing it with? You may have a very nice linear relationship to other variables without any transformation at all. If some transformation is needed, look into transformations that do not have to be mathematically mutilated to apply, such as the cube root, or the inverse tangent, or logit, etc., but be guided primarily by the graphical appearance of the relationships you are trying to capture. And if income is your dependent variable and graphical exploration suggests something that looks more or less like a logarithmic relationship to your outcome, then use a generalized linear model with a log link rather than transforming. (See -help glm-.) Note that even when there is no concern about zeroes or negative values, log transformation is often inferior to using Poisson regression, which is an example of a glm with a log link. See, for example, http://blog.stata.com/2011/08/22/use...tell-a-friend/.

    Comment


    • #3
      In my regression model, income data is dependent variable and it does not appear to be normally distributed. That was my concern.
      Your explanation helps me a lot.
      Thanks!

      Comment

      Working...
      X