You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Thanks for the quick straight answer. I've got a panel dataset of 108 cohorts over five periods. I having been wanting to regress the log of annualized change in the population of those living below a defined poverty thresholds on the annualized log of change in the real value of survey mean.
Taking the log of a change doesn't usually make sense, unless the circumstances are such that only positive changes can occur. What is the science that suggests a log annualized change vs log real value of survey mean model? Would it make sense to look at the logarithm of the ratio of current population living below poverty threshold to some baseline value? The ratios will always be positive numbers, and taking logarithms has the advantage of making a doubling and a halving metrically similar.
This is a brilliant idea, thanks! We want to know whether the poor are sharing in the growth in average living standards. That's the basis for using rate of changes. In the case of increase in poverty rate, we will have negative values.
Adding a constant in order to get only positive values makes it mathematically possible to apply a log transform. But it's almost always a really bad idea. The results you get from that are very sensitive to the choice of the constant being added, and the impact of that on subsequent analyses can be enormous. So unless there is a scientifically justified choice of the constant (or one justified by the data collection procedures) the results may well be meaningless.
The presence of negative values in a variable is usually a good sign that taking logs is conceptually inappropriate (unless the negative values are themselves data errors).
I wish to underline that I absolutely agree with the comments made by Clyde and the mathematical principles brightly presented.
That said, and just for the sake of clarifying what it was really underlined by me, and, well, as demanded by Zuhumnan, "helping with some ideas on logtransformation of negative values":
Dear, Zuhumnan, the values "1" or "0.001" are provided after the constant was added. In short, I didn't mean adding 0.001 or 1 to all negative values, but adding a constant value that, summed to the minimum value, would give 0.001 or 1 to the lowest value, according to one's choices, before the logtransformation.
Occasionally a variable can take on values including some very large negative values and some very large positive values. In those cases a symmetric transformation that pulls in extreme values and preserves sign can be useful. Examples I've seen in the literature are (in Stata terms)
It should be obvious that no solution with discontinuity or singularity at 0 (or anywhere else) is a good idea here.
I am not a fan of any log(x + constant) solution at all, even though they have their advocates. In some fields, log(count + 1) is widely used when counts include 0s as stronger than taking square roots, but extending this solution to negative values is hard to make convincing. There is a real danger of creating massive outliers. If you are tempted make sure you graph the "solution" and see that it makes sense. I support Clyde's stance on this idea.
Cube roots are often overlooked, but see http://www.stata-journal.com/sjpdf.h...iclenum=st0223 for the idea and http://www.nature.com/nature/journal...ATURE-20130829 for an example we published in Nature.
Austin Nichols and I have often commented on this transformation on Statalist; Austin has found other odd integer roots (fifth, seventh, ...) of interest too. It's vital to note that something just like x^(1/3) is insufficient in Stata; part of the point of my SJ Tip was to explain why.
The following blog posts contain a variety of ideas and comments, not all of which seem solutions to me:
The package transint from SSC consists of a help file on transformations with a section on this problem (and extra pertinent references). Sooner or later I hope to write that up as a paper in the Stata Journal.
These comments focus on symmetric transforms applicable to variables of any sign, but whether those transforms do what you want in regression-like models is an open question.
One interesting way of catching up with this crystal-clear information might be a visual display: an exercise with the commands ladder and gladder in Stata.
If I misunderstood something, please correct me.
We created a variable with normal distribution, having negative and positive values. Then, other variable, making it slightly skewed. Finally, a more extreme situation, with large negative values.
Logically, square root and logtransformation cannot be applied under these terms. Cubic transformation presented the best curve only when the variable already had a normal pattern of distribution. Quite on the contrary, slightly or extremely skewed distribution with negative and positive values didn't have a better prospect, no matter the sort of transformation (cubic, square, 1/cubic or 1/square, etc.).
By doing this rather modest exercise, I believe we cannot help but agree, again and again, with Clydes and Nick's statements.
Comment