Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • square to log transformation for dealing with non-normal distribuion

    I have a skewed distribution with a large no. of outliers. Also, some of my observations are negative. and as such, I cannot use log transformation, as it will result in a reduced number of observations. I also don't want to use winsorization. So can I get the square of my data then transform them into log form? as it appears to solve my both problems.

  • #2
    Dear Abid Jahangir,

    You can do that, but you may not want to do it because the results will be impossible to interpret. Maybe you can provide more information on the data and on what you want to do with it, but it may be better not to transform the data at all.

    Best wishes,

    Joao

    Comment


    • #3
      Squaring is not a monotonic transformation whenever the argument can be negative or positive. So that alone throws away information and treats -x and +x identically.

      So neither is the log of the square a monotonic transformation, and -- even worse -- this transformation is undefined at zero.

      Code:
      twoway function log(x^2), range(-10 10)
      shows the problem -- understates it, in fact, as here the graphic result only hints at the problem, but the problem follows from basic mathematics that any logarithm is undefined at zero. I'll be dogmatic and doubt that this function is ever what you need in analysing data.

      In any case why do you think that a non-normal distribution of your variable is a problem? Even for plain or vanilla regression it's not an assumption that any marginal distribution is normal.

      You might tell us the range of the variable, how many values are negative, zero, or positive, and show the results of quantile on your problematic variable. Is it an outcome variable or a predictor variable, and what kind of model do you intend to fit?

      For example, if there are some small negative values and a range of values up to large positive, then some flavour of Poisson regression or generalized linear model with logarithmic link might still be helpful so long as the mean function can be expected to be positive.
      Last edited by Nick Cox; 13 Feb 2022, 02:03.

      Comment

      Working...
      X