Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable transformation

    Hi!

    Could someone please give me an advice about how to transform the following quantitative variable to introduce it in a regression analysis.

    It is clear that the variable does not have a normal distribution, but after many attempts of transformation (for example logarithmic, right panel) I do not get a satisfactory result.

    Click image for larger version

Name:	I1.png
Views:	1
Size:	28.1 KB
ID:	1431155

  • #2
    There is only so much you can do. The Third Law of Transformations is that a spike can only transform to another spike on any transformed scale that makes sense. Don't ask me what the other Laws are quite yet.

    More crucially it's not an assumption of regression that predictors are normally distributed! Otherwise how could indicator variables possibly be acceptable? And it's not an assumption about the (marginal) distribution of the response either!

    Neither distribution looks pathological to me. The bigger deal is whether y = Xb models the systematic structure in the conditional means.

    Comment


    • #3
      Hi German,

      How do you plan to use this variable? As the outcome? A predictor variable? This may influence my advice for you.

      Without knowing that information, you can use Stata's set of commands -ladder-, -gladder-, or -qladder- to help to determine a transformation that may help achieve a Normal distribution. I would like to echo Nick's advice from above, though, that you may not need to transform at all to perform a reasonable linear regression model. Normality is not an assumption of the predictor variables or of the marginal distribution of the outcome (only for the conditional distribution).

      Comment


      • #4
        And I would like to add to Matt Warkentin's advice the observation that even the conditional distribution of the response variables (equivalently, the residual distribution) need not be normal if the sample is large enough because the central limit theorem will kick in and make the calculated test statistics (asymptotically) normally distributed anyway. Normality is highly overrated. Too much energy and time are wasted in its needless pursuit.

        Comment

        Working...
        X